The Structure of Comment Networks

Size: px
Start display at page:

Download "The Structure of Comment Networks"

Transcription

1 The Structure of Comment Networks Vahed Qazvinian Department of EECS University of Michigan Ann Arbor, MI Jafar Adibi Center for Advanced Research PricewaterhouseCoopers LLP San Jose, CA Abtin Rasoulian Department of Informatics Technischen Universität München Munich, Germany Abstract Blogs form an important on-line social network as they are maintained periodically, and are easily accessible. An important aspect of any blogspace is the relationship of bloggers based on comments. This important aspect is, however, largely ignored in previous works. In this work, we study the evolution of this specific type of social network, blogosphere comment graph. We look at the densification of the comment network, and study its local patterns. We observe how the comment network evolves as different bloggers place comments on each other. We investigate the densification of this specific network and observe a high correlation between the number of comments placed and received. Finally, we propose a growth model that best describes the behavior of users who place comments. 1 Introduction Social networks are not just about the crowds. Adar et al. [7] indicates that a typical science journal might have 500 readers a day, a typical science blog has the same number of readers, and a popular one has as many as 6, 000 daily viewers. In contrast, Serendip (which is a kind of group blog ) gets 15,000 visitors a day. Blogs connect people and groups to each other through links and comments. Therefore, analyzing the blogging behavior to understand this social interaction is very crucial. Several researchers have discussed different aspects of the blogspace in the past few years. The formation of large social networks in blog communities, the interaction among local clusters in blogosphere, the ranking mechanism for social media [4], information propagation, information epidemics [1], and political blogspace dynamics [2] are studied in detail. We will review some of these works in section 2. Blogspace provides readers with the opportunity to place their opinions as comments. Comments are important in blog analysis and play an essential role in blogosphere as shows in [27, 10]. In this work, we look at the evolution of comment networks, in which nodes represent bloggers and links represent comments. The structure of various networks has been previously studied [5, 8, 9, 23, 13] and modelled [18, 12, 15, 16, 17, 19]. However, no existing model captures the growth of comment networks. Here, we study a large comment network and try to explain its growth behavior. Comment network is different from web graph, paper citation networks, or even other types of blog networks for two main reasons. First, a higher frequency of comments between two bloggers might show a stronger tie between them. Other unweighted networks such as traditional web, paper citation networks, and blogrolling links lack this important feature. Second, a comment is usually followed by an identity and a hyperlink to the person who leaves it. This may help a blogger to make links to her own blog in other pages that will literally help attract more audience. These two features imply the fact that comments are more than mere links, which makes it crucial to take a deeper look at the comments network. The rest of the paper is organized as follows. We review some of the previous works on blogspace in Section 2. Section 3 describes our dataset, and how we create the comment data with a time evolving network. We make a number of observations and analysis on the this network, and list our findings in Section 4. Finally we propose a model of growth that best describes the comment network. 2 Related Work A number of studies have focused on the blogspace. To name few: Adamic et al. [2] studied the network structure of political blogs, and Thelwall [26] gave a descriptive analysis of blog postings around the London attacks. The dynamics of blogosphere is studied in [4, 14]. Adar et al. [4] described the information epidemics in blogs and introduced a new ranking algorithm for blog pages. Lin et al. [20] has defined a mutual awareness relationship to discover 1

2 Weblogs 22,306 Posts 348,700 Comments 1,257,561 Commented Posts 339,884 (97.5%) Uncommented Posts 8,816 (2.5%) Table 1. Basic analysis on corpus size and accuracy communities in blog social network based on two factors. First, communities are formed according to the actions of individuals in the network. Second, the semantic of the hyperlink structure in blogs is different from that of traditional web. While many previous works on weblogs have focused on post data, few researchers have studied the network of comments. Trevino et al. [27] and Gumbrecht et al. [10] showed the importance of comments in blogosphere analysis, and concluded that comments play an essential role in interactive nature of blogs. Herrig et al. [11] studied a small comment dataset of 203 weblogs. A larger scale study on comments investigates the relations of comments and posts, and extracts commenting pattern based on blog popularity [22]. Mishne and Glance [22] observed that 28% of the nearly 36, 000 blogs in their corpus contained comments. They also showed that utilizing blog comments will help improve the search recall. However, the corpus used in [22] does not cover a long period of time and thus, is not useful in a study of comments over time. Despite the wide range of previous works on blogs, there is no significant work, to the knowledge of authors, that models the growth of the comment network. 3 Preliminaries 3.1 Data For this study we use the Persian Blog dataset introduced by Qazvinian et al. [25]. The authors crawled a Persian blog host 1, and performed the preprocessing to export the collected pages into XML files. This dataset contains monthly archives of more than 22, 000 weblogs in a 15-month period. The number of posts exceeds 347, 800, which contains 1, 258, 000 comments, with an average of 3.6 comments per post. This average is much higher than that of the blog corpora in [11, 22], for which the average comment per post is 0.3, 0.9 respectively. Such high ratio makes this dataset a better candidate for the study of comment networks. Table 1 shows the basic statistics of this dataset. 1 Nodes 21, 124 Edges 69, 583 Number of Comments 226, 133 # single nodes 11, 679 # Strongly Connected Components 4, 342 # Weakly Connected Components 201 Strongly LCC size 5, 080 Weakly LCC size 9, 169 W/S Clustering Coefficient Undirected W/S Clustering Coefficient Diameter 11 Undirected Diameter 10 Average Distance Undirected Average Distance Table 2. Basic statistics of the comment network G Similar to other blog datasets, links in this collection are categorized into four different classes: Blog Roll Link. Blogrolls are hyperlinks, put in the side bar of blog page, and usually point to blogs or pages read regularly by the blog maintainer. Post Link. Post links are hyperlinks put in the content (body) of an entry pointing to another page. Comment outlink. Comment outlinks are hyperlinks, put in the content (body) of a comment. Comment inlink. A comment inlink is a hyperlink left in the footer of a comment and points to the blog, homepage, or address of the person who leaves it. This dataset has two main properties that distinguish it from other corpora, and make it appropriate for this work. First, it includes comments posted by readers for each entry. Second, it covers 15 months of the blog archives. This long period enables us to look at the evolution of comment network over time. 3.2 Comment Network We study the structure of the evolving comment network by observing its properties in an interval of equally spaced points in time. To achieve this, we introduce a definition for timed graph. Our definition is similar to the definition of this concept in [15]. Let s define a timed graph, G, to be an ensemble of weighted graph snapshots, taken at different time slots. We look at these snapshots as an ordered set, G {G(V, E, t, w); t = 1, 2,, n} 2

3 where V is the set of nodes, E V V is the set of edges, and w is a positive weight function w : E R + that attaches to every edge e(i, j) E a positive weight at time t. For simplicity, we denote the graph observed at time t, G(V, E, t, w), by G t and its corresponding weight function by w t. In other words, w t (e(i, j)) shows the weight of the directed edge from i to j in G t. Obviously, w t (e(i, j)) = 0 shows the existence of no comments between i and j till time t. Under this framework, we can build a network of comments, in which nodes are bloggers and directed edges represent comments. To build this network, we use the comment inlink data. A comment inlink, as mentioned before, is the hyperlink pointing to the page of the person who leaves the comment. This is important since a blogger can deliberately put a link to her own blog by leaving a comment at another s post. A directed edge from i to j shows the comments left for i by j. In this setting, w t (e(i, j)) shows the number of all comments that j left for i at time t and before that. We extracted all comments that were left by the bloggers within the network. To make the timed graph, we extracted all comments left in a period of 52 weeks which is one year worth of blogging of the entire 15 month period, covered by the corpus. To ensure the high quality of the comment set, we ignored the first and the last 45 days of the 15 month corpus since it takes a while till a post gets all of its comments. For an even simpler representation, let s denote the final comment network, G 52, by G. Table 2 summarizes the basic statistics of G. The number of nodes in the comment network is less than the number of blogs in the corpus. This indicates some bloggers have left no comments, nor have they ever received any, so we ignore such blogs in this study. Obviously, the number of comments in the network is equal to the sum of all weights of the edges. 4 Observations We try to look at the comment network through different lenses and analyze its structure from different perspectives. In this section we will describe the structure of this network, and its evolution. 4.1 Growth For the comment graph, G, we study the number of nodes v(t), the number of edges e(t), and the number of comments c(t), at each point in time t. Leskovec et al. [18] observes densification power-laws in citation networks. In a growing network with an underlying densification powerlaw, the number of edges is proportional to a power of the Number of nodes Number of comments node y = 94.01*x ; R= week comment y = 0.003*x 1.97 ; R= Number of nodes.. Number of edges Number of comments edge y = 0.021*x 1.65 ; R= Number of nodes comment y = 0.294*x 1.22 ; R= Number of edges Figure 1. Comment network growth. (a) Number of nodes per week. (b) Number of edges vs. nodes. (c) Number of comments vs. nodes. (d) Number of comments vs. number of edges. b,c,d are plotted in a log-log scale. Slopes are 94.01, 1.65, 1.97, 1.22 respectively. number of nodes. We observe the densification power-law in the comment network with the following properties: n(t) t (1) e(t) n(t) a1 (2) c(t) e(t) a2 (3) c(t) n(t) a1+a2=a3 (4) Figure 1 shows the number of nodes per week, the number of edges versus nodes, the number of comments versus nodes, and the number of comments versus edges. The last three are plotted in a log-log scale and have slopes greater than 1 which confirms a non-linear growth in the number of edges and comments. Figure 1 (a) indicates that the nodes are added to the network at a constant rate of approximately 92 nodes a week. According to the fact that the data has a missing past, and we don t have the network data all the way back to its birth, we would have a sudden increase in the number of nodes in the first few weeks. Therefore, we opted to fit the linear line starting from the 10th week, assuming that the pre-existing nodes have had at least one activity during the first ten week of the observations. Such a densification power-law in comments and edges should result in the emergence of shrinking diameters. Shrinking diameters have been observed before in citation networks [18], Yahoo! 360, and Flickr [15] social net- 3

4 average directed distance nodes nodes (a) average distance. (b) diameter diameter (a) OutD t (c) InD t 4 (b) OutW t (d) InW t Figure 2. (a) Average directed distance, and (b) diameter in the strongly connected component of the comment graph vs. the number of nodes works. Here, we look at the average directed distance in the strongly connected component of the comment network. Figure 2 (a) shows this value as the graph grows over time. This might suggest senior bloggers (i.e., bloggers who joined earlier) place comments on other bloggers which decreases the average distance to their own blog. The network diameter (i.e., longest shortest path), as shown in Figure 2, is not decreasing in the strongly connected component due the following argument. While the network diameter is shrinking due to densification power-law, a new node increases the diameter by a value smaller than, or equal to Distributions We look at four different distributions in the comment network. Indegree, outdegree, and two weight distributions. Each of these features have different interpretations. Indegree of a node, v i, in G t shows the number of people for whom the blogger v i has left comments at time t. Similarly, the outdegree of v i, in G t shows the number of people from whom the bloger v i has received comments at time t. We also look at two other features of a node v i. The sum of the weights on the incoming edges, which is the number of comments left by v i, and the sum of the outgoing edges, which is the number of comments received by v i. We define the following four random variables at the time t: the number of people who place comments on v i (OutD t ), who receive comments from v i (InD t ), the number of comments received by v i (OutW t ), and the number of comments placed by v i (InW t ). In this network the degree of a node is equal to OutD t + InD t, and the sum of the weights is the number of comments which is equal to OutW t + InW t. Table 3 summarizes the description of these random variables. Figure 3. Distribution of (a)outd t, (b)outw t, (c)ind t, (d)inw t R.V. InD t OutD t InW t OutW t Description # people who receive comments from a node # people place comments for a node # comments a node places # comments a node receives Table 3. Description of four basic random variables The power-law degree distribution in different networks has been studied before [24]. Our network, however, involves in four different distributions corresponding to the four random variables introduced above. Since placing comments is time consuming, we believe bloggers usually have a limited number of neighbors. Based on this argument Jin et al. [12] indicates that a growing social network does not exhibit a power-law degree distribution. Instead, it is strongly peaked around a certain mean degree and is not remarkably right-skewed. Even though comment network might be somewhat similar to the friendship network in [12], our observations show that the distributions of the four random variables are consistent with a model in which x is drawn from a power-law of the form p(x) x α. Table 4 shows the exponent (α) and the correlation coefficient (R) for the best fit power-laws in G (i.e. at t = 52). Figure 3 illustrates these four distributions. The power-law exponent in these four distributions is just slightly above 1. This small exponent might suggest that although the growth of the comment network has some flavors of the preferential attachment, it is not an immediate result of that model. 4

5 R.V. α R OutD t OutW t InD t InW t Table 4. Best power-law fit for the four basic random variables, at t = 52 R.V. OutD t OutW t InD t InW t OutD t OutW t InD t InW t 1.00 Table 5. Correlation coefficient (ρ) of the four basic random variables, at t = Correlation The correlation coefficient (ρ) of two random variables measures the strength and direction of a linear relationship between those two random variables. Table 5 shows the pairwise correlation coefficient of the four random variables. According to the symmetric property of this measure only the upper triangle of the matrix is illustrated. Clearly, there is a high correlation between the number of people one interacts with, and the number of comments she receives or leaves. This conclusion is based on two high correlations, ρ(outd t, OutW t ) = 0.84 and ρ(ind t, InW t ) = Furthermore, the number of comments one leaves exhibits a high correlation with the number of comments she receives ρ(inw t, OutW t ) = Local Patterns Local patterns (e.g. triangles) in the comment network are good indicators of bloggers interactions. In particular, looking at motifs of size 2 and 3 is quite useful to describe the structure of this network. Some previous works also used local patterns and motifs to describe the structure of expertise social networks [30] and knowledge sharing networks [3]. We compare the occurrence frequencies of subgraphs of size 2 and 3 in the comment network with randomized ones [21]. We use FANMOD [29] as a tool to extract the motifs from our comment network. Table 6 shows the frequency of each motif structure in the comment network, and its expected frequency in a sample of 1000 randomized versions. We used two randomization processes. The first one, R 1, switches the edges between nodes while maintaining the number of bidirectional edges globally constant. The second one, R 2, randomizes the graph regardless of the number of bidirectional edges. Previous works have observed the existence of reciprocal relationships in a variety of social networks. Kumar et al. [15] observes that a large number of edges are bidirectional in Yahoo! 360, and Flickr friendship networks. In our work, we don t look at the friendship relationship (expressed by blogroll links). Rather, we look at the patterns of comments. Table 6 shows that the number of mutual comments between bloggers is 43.43%. This number is astonishingly lower in a randomized network with the same sparsity. In other words, if John places a comment on Mary s post, he will most likely receive a comment from Mary. This probability could be as high as 0.43 in the comment network, while negligible in a randomized one. Let s call the directed edge from i to j (i.e. comment from j to i) left at time t, a loop-closing edge (comment), if it satisfies the following three conditions. 1. w t (e(i, j)) = 1 2. w t (e(i, j)) = 0; t < t 3. w t ɛ (e(j, i)) > 0 The first two conditions ensure that an edge exists from i to j at time t but not before that. The third condition indicates that at least one edge from j to i existed before the edge from i to j was created. In other words, a loop-closing edge from i to j is the first comment from j on another blog, i, who has commented on j before. We would like to find the fraction of edges which are loop-closing. To achieve this, we plot the number of such edges versus the total number of edges, as the graph grows. Figure 4 shows this plot, both in a normal scale and a log-log scale. The regression line in the log-log scale has a slope value equal to a = This value which is very close to 1, suggests a linear relation between the the two values. Therefore, the total number of new edges added to the network would be e(t) = e (1) (t) + e (2) (t) (5) where the component e (1) (t) = α.e(t) the number of loopclosing edges, and the component e (2) (t) = (1 α) represents the bloggers attempt to initiate new relationship. In our data, according to the regression line in Figure 4 (a), we have α Subgraph structures of size 3 are also illustrated in Table 6. The majority subgraphs are of type 36 in which a blogger, b i, has placed comments on two other bloggers who have never placed comments on an entry of b i. An interesting pattern is motif 46 where b i places comments on two other bloggers who comment on each other, but not b i. The frequency of this motif is significantly higher in comment network than both randomized versions. It s also interesting to see that a one directional loop, as in motif 140, 5

6 3.5 x W/S CC over time 0.09 W/S CC vs. nodes reciprocal edges reciprocal edges 10 4 W/S CC W/S CC Recp. edges y = 0.44*x; R= edges 6 8 x 10 4 (a) Recp. edges y = 0.68*x 0.96 ; R= edges (b) Week (a) nodes (b) Figure 4. The number loop-closing edges versus the number of edges in (a) linear scale (b) log-log scale Figure 5. Watts Strogatz clustering coefficient (CC) in the undirected G (a)over time, (b)versus nodes. rarely occurs in a comment network. This nonreciprocal transitive relationship accounts for only 0.01 percent of the triads in this network. In addition to the above observations, we can see in Table 6 that triangles contribute to a significantly smaller portion of the triads in the comment network, which causes a small clustering coefficient. Table 1 shows that the clustering coefficient of undirected G is Figure 5 illustrates the Watts Strogatz clustering coefficient [28] of the same network versus weeks and number of nodes. Although the comment network follows a densification power-law [18] in that the number of distinct edges grow exponentially with the number of nodes, the clustering coefficient maintains a decreasing tone overtime and obtains tiny values. This shows that bloggers do not necessarily place comments on neighbors of their neighbors. This is in contrast with several other well-known social networks as shown in [17] in which 30%60% of edges close triangles (i.e., the tail is only two hops from the head). 5 Model Based on the above discussions and observations, we would like to propose a growth model for comment network. One of the major differences of the comment network with previously modelled networks is its dynamic nature. The set of interactions of a blogger is not limited to the arrival time. Rather, we look at a set of comments over time. Previous work has not captured this behavior. In the following, we will briefly describe the shortcomings of some of these models. 5.1 Existing Models In past decade, several evolution models are proposed to explain degree distribution, average shortest path, clustering coefficient, and other properties of online social networks. Classical models such as the preferential attachment [6] and the copying model [16] are not comprehensive enough to capture all properties of complex networks. Recently, more advanced models [12, 18, 15] are proposed as a result. The community guided attachment, and the forest fire model [18] require the nodes to make links only upon arrival. This model describes the citation networks in which an article cites some other previously published articles when published. Unlike comment networks, citation networks lack bidirectional edges. Moreover, in the forest fire model, the nodes perform breadth first citations with certain probabilities decreasing with the breadth level. This will naturally form triangles in citation networks, which might cause a high clustering coefficient. The social network growth model proposed by Jin et al. [12] does not exhibit a power-law distribution of degree. This model assumes a constant number of nodes in the network with certain mean degree, which means a person can maintain only a certain number of friends. In this model, nodes connect to each other based on a probability proportional to the number of mutual neighbors or friends. This implies a significantly higher clustering coefficient if the friendship decay factor is small. This is in contrast with the observed small clustering coefficient in the comment networks. The model described by Kumar et al. [15] captures the behavior of the growing friendship networks in the societies of Yahoo! 360 and Flickr, where the friendship is determined based on the appearance of one node in the other s contact list. This model can successfully describe the growth of friendship network based on the blogrolling 6

7 Motif Motif ID G 56.57%43.43%28.07%22.44%14.82%13.09%11.51% 8.52% 0.40% 0.37% 0.26% 0.24% 0.16% 0.11% 0.01% R %43.43%24.70%27.57%10.20%12.38%12.38%12.15%0.13% 0.17% 0.14% 0.06% 0.07% 0.04% 0.01% R % 0.01% 35.43%22.44%14.82%13.09%20.12% 0.03% 0.71% 0.01% 0.03% 0.02% 0.01% 0.04% 0.14% S % 6.54% 15.02%10.30% 3.08% 4.84% 55.56%0.31% 2.67% 0.17% 0.05% 0.95% 0.18% 0.09% Table 6. Distribution of motifs in comment network, randomized ones with maintaining the number of bidirectional edges (R1), and without maintaining the number of bidirectional edges (R2), and the synthetic network S links, but does not have an underlying dynamic structure which is the main characterization of comment networks. The frequency of interactions between two nodes, in a friendship network or a blogroll network, is limited to one. However, a blogger can place or receive comments several times in the comment network environment. 5.2 Proposed Model The model that successfully captures the growth of comment network should have the following properties. Nodes are added to the network at a constant rate over time. The number of distinct edges should have a non-linear growth with the number of nodes. Moreover, the number of comments should also grow nonlinearly with the number of links, and therefore nodes. The model should exhibit a power-law distribution in indegree, outdegree, the number of comments received, and the number of comments placed, as well as a high correlation between these four basic random variables, InD t, OutD t, InW t, and OutW t. Moreover, it should create a small slope in the power-law distribution which indicates the model is not a direct result of preferential attachment behavior. The growth model should capture the high fraction of loop-closing edges and the low non-increasing clustering coefficient. We propose a simple model for the evolution of comment network. In our model, a new node joins the network at each time t. This addition does not necessarily imply creation of a new blog. A blog may exist long before it receives or leaves its first comment. Here, by addition of a blog to the comment network, we imply that it joins the network. At each time step t, a new blog places one comment on an existing blog chosen uniformly at random. This will cause others to learn about the existence of her blog. During the inter-arrival time of two new nodes, t 1 and t, a number of c t comments are added to the network, c t = c(t) c(t 1) n(t) a3 n(t 1) a3 (6) where a 3 = a 1 + a 2. Our model aims to describe how this number of comments are distributed in the network. Let s assume that newly added comments are made up of two components, c t = c (1) t + c (2) t (7) where c (1) t accounts for all comments that are left to build new relationships (i.e., non loop-closing new links), and c (2) t accounts for all other comments, including loop-closing links. In fact, c (1) t is equal to e (2) t = e (2) (t) e (2) (t 1) from equation 5. Thus equation 7 can be written as c t = e (2) t + c (2) t (8) In our model at each time step t, the number of c (2) t = c t e (2) t comments are left from bloggers to those from whom they have received comments. The probability that j leaves a comment for i at t, is proportional to the number of comments j has received from i before t. More formally, the probability that an edge appears from i to j or its weight increased (in case it already existed) at time t, is proportional to w t (e(j, i)). This component will ensure that the probability of receiving a comment increases with the number of comments left. At the same time e (2) t new edges are distributed in the network with the following procedure. These edges are comments, placed to attract more readers. The probability that j is the tail of the edge (comment placer) decreases linearly with j s outdegree. Thus, those who have received fewer comments by time t, will be more likely to leave comments at t, contributing to the e (2) t component. We assume the head of the edge (the blog who receives the comment) is determined uniformly at random which is also the case in node arrivals. 7

8 Nodes 2, 000 Edges 80, 523 Number of Comments 411, 372 W/S Clustering Coefficient 0.27 Diameter 6 Average Distance 2.10 Table 7. Statistics of the synthetic comment network S The intuition behind this model is clear. We assume the time and the cost in placing comments in the blogosphere is limited. Therefore, bloggers who receive a lot of comments, will spend most of their time on replying back, while those who receive fewer comments or have a smaller set of friends try to place comments on strangers. 5.3 Simulation Results In this section we describe the simulation of our model. The procedure to generate the synthetic network is as follows. In each iteration t, 1. the node t joins the network by adding an edge e(j, t) with weight 1, where j < t and is determined uniformly at random. 2. (1 α).e t new links are added to the network where e t = e(t) e(t 1). Each of these edges are from i to j where j is selected with a probability decreasing with j s outdegree, and i is chosen uniformly at random. Here, e t = e(t) e(t 1) and e(t) = n(t) a2. 3. The number of c t (1 α).e t edges are distributed in the network. The probability of formation of the edge e(i, j) (or an increment in weight, in case of an existing edge) is proportional to the weight of e(j, i). Here c t = c(t) c(t 1) and c(t) = n(t) a3. In our simulation, we set α = 1 2, a 2 = 1.4, and a 3 = 1.7 and create a 2000 node network. Let s denote the final synthetic network by S. Table 7 shows the basic statistics of S. This table confirms that the model creates a low clustering coefficient, as well as a small average distance. The distributions for the simulated network exhibit a power-law with small slopes. The model has some flavor of preferential attachment in that receiving a comment has a probability proportional to OutW. However, new link are generated based on a uniform distribution, which makes small slopes in the log-log plots. The slope for the real network are slightly above 1, and for the synthetic data with the above parameter setting are below 1. Figure 6 illustrates the distribution of the four random variables, OutD, R.V. OutD t OutW t InD t InW t OutD t OutW t InD t InW t 1.00 Table 8. Correlation coefficient (ρ) of the four basic random variables, in synthetic network OutW, InD, InW in log-log scale. We also calculated the correlation coefficient of these four variables. Table 8 shows a high correlation between these variable created by the model, which confirms the earlier observation on the real data. Since the network growth model encourages mutual comments (i.e., comments in both directions), we expect a high number of loop-closing and reciprocal edges. The fraction of reciprocal edges in S and G are 43.64% and 43.43% respectively. We extracted the network motifs for the synthetic network as well. The fourth row in Table 6 lists the distribution of motifs of size 2 and 3 in the synthetic data. The frequency of different subgraphs of size 3 indicates that the synthetic network exhibits the same microscopic properties of the comment network. The frequency of triangles is still a significantly smaller portion of the triads in the network. This causes low clustering coefficient of the synthetic network of approximately The only significant difference in the frequencies of motifs between G and S is that of motifs 36, 6, and 78. Motif 36 is the case where a blogger i placed comments on j and k but have received no comments from them. The case in motif 6 is slightly different, in which a blogger i has received comments from j, and k, but have never replied back. However, if the bloggers were to leave loop-closing comments with higher probabilities, then both motifs would change to motif 78. In motif 78, a blogger i has received comments from j and k, and has placed comments on both of them as well. The probability of forming a non loop-closing new edge is indirectly controlled by one parameter, α which in our simulation is set to 0.5. If we decrease α fewer loopclosing edges will appear, and the frequency of motif 78 will decrease. This will immediately cause an increase in occurrence frequency of both motifs 36, and Discussion The model introduced in this paper describes the growth of the comment network in a specific blogosphere. Our method is based on empirical analysis of data and observations on a single large dataset. This model is formalized based on three parameters, a 3, a 2, and α. This model, as shown, successfully describes the growth of the giant 8

9 (a)outd 10 4 (c)ind (b)outw 10 5 (d)inw Figure 6. Distributions of (a)outd, (b)outw, (c)ind, (d)inw in S with the best power-law fit exponent equal to α = 0.61; R = 0.73, α = 0.66; R = 0.76, α = 0.73; R = 0.91, α = 0.79, R = 0.95 respectively component in the comment network. However, given that each node connects to the giant component upon arrival, the model will create a single weakly connected component. However, Table 2 shows that the giant component in the comment network hardly constitutes for half of the network, while there are 201 weakly connected components. One way to create the network with different disconnected components, is to introduce a new parameter. Each new node, upon arrival, makes a link with a certain probability. Thus, those nodes that do not connect to the giant component, may receive comments later from future new nodes, and form smaller components. This will produce the growth of different components, each of which follows the model described in this paper. An empirical or an exact analytical solution to the growth of the comment network with disconnected components might be a good future direction. 6 Conclusion and Future Work In this paper we introduced, illustrated and analyzed our understanding on a rich and dense comment network extracted from PersianBlog dataset. Our goal was to measure the characteristics of this network and to provide rather a simple model to explain the behavior of comment network. Our observation indicates high correlation between the number of comments placed, and the number of comments received by bloggers. This suggests the more a blogger place comments she receives more comments from other bloggers. We also examined the distribution of major network parameters: indegree, outdegree, number of comments placed and received by bloggers. The power-law exponents for the distributions of all of these parameters are slightly above 1, which suggests the comment network has a very weak flavor of preferential attachment. We believe this is according to the lack of global knowledge on the blogosphere where either bloggers do not know who is well connected and famous or they do not place comments on those blogs. In addition, our study also illustrated that the behavior of bloggers to place comments in blogosphere does not tend to form triangles, which means they do not place comments on blogs of friends of friends. We showed that this behavior causes a low clustering coefficient of the network. The comment network has a special characteristic, which provides readers with the opportunity to place their opinions as comments. This unique feature makes the blog environment an appropriate place for the spread of link spams, and spam comments to increase popularity. Our future work is in three directions. First, we are interested in the analysis of comments network from game theory point of view to find a best strategy of placing comments on other blogs to maximize one s score (e.g. Pagerank). As mentioned earlier this is a totally different problem with other well known Pagerank applications. Second, we showed that bloggers strategies to place comments on other blogs does not follow a preferential attachment model, hence it needs new approaches to detect spams in contrast with well-know spam detection techniques in internet and . We illustrated that analysis of motifs can give us valuable information about the microscopic behavior of bloggers. We plan to continue this work for spam detection using motif based features. Third, it is worthwhile to compare the network of comments with that of post links, and blogrolls. A major fraction of comments fall between friends, whose relationship could be determined by studying blogrolling links. Also it is important to look at comment networks of different languages or genres. One good future direction might be to see if the comment network differs from one category to another (e.g. Science vs. Sports), and if cultural issues might affect the network properties. References [1] L. Adamic and E. Adar. Tracking information epidemics in blogspace. In Proc. of the 2005 IEEE/WIC/ACM inter- 9

10 national conference on Web Intelligence, pages , [2] L. Adamic and N. Glance. The political blogosphere and the 2004 u.s. election: Divided they blog. In Proceedings of the WWW2005 Conference s 2nd Annual Workshop on the Weblogging Ecosystem: Aggregation, Analysis, and Dynamics, [3] L. A. Adamic, J. Zhang, E. Bakshy, and M. Ackerman. Knowledge sharing and yahoo answers: Everyone knows something. In proceedings of international conference on world wide web, (WWW2008), [4] E. Adar, L. Zhang, L. A. Adamic, and R. M. Lukose. Implicit structure and the dynamics of Blogspace. In WWW- WS2004B, [5] R. Albert and A.-L. Barabasi. Statistical mechanics of complex networks. Reviews of Modern Physics, 72(1):48 97, [6] A.-L. Barabási and R. Albert. Emergence of scaling in random networks. Science, 286: , [7] L. Blankenship. Blogging science: The spin and what we can do about it. In The Center of Science in Society, Brown Bag Discussion Group, April [8] A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener. Graph structure in the web. Comput. Netw., 33(1-6): , [9] M. Faloutsos, P. Faloutsos, and C. Faloutsos. On powerlaw relationships of the internet topology. In SIGCOMM 99: Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication, pages , New York, NY, USA, ACM. [10] M. Gumbrecht. Blogs as protected space. In In WWW 2004 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics, at WWW 04: the 13th international conference on World Wide Web, [11] S. C. Herring, L. A. Scheidt, S. Bonus, and E. Wright. Bridging the gap: A genre analysis of weblogs. In HICSS 04: Proceedings of the 37th Annual Hawaii International Conference on System Sciences (HICSS 04) - Track 4, page , Washington, DC, USA, IEEE Computer Society. [12] E. M. Jin, M. Girvan, and M. E. Newman. Structure of growing social networks. Phys. Rev. E, 64(4):046132, Sep [13] G. Kossinets and D. J. Watts. Empirical analysis of an evolving social network. Science, 311(5757):88 90, January [14] R. Kumar, J. Novak, P. Raghavan, and A. Tomkins. On the bursty evolution of blogspace. In WWW 03: Proceedings of the 12th international conference on World Wide Web, pages , New York, NY, USA, ACM. [15] R. Kumar, J. Novak, and A. Tomkins. Structure and evolution of online social networks. In KDD 06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages , New York, NY, USA, ACM. [16] R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and E. Upfal. Stochastic models for the web graph. In FOCS 00: Proceedings of the 41st Annual Symposium on Foundations of Computer Science, page 57, Washington, DC, USA, IEEE Computer Society. [17] J. Leskovec, L. Backstrom, R. Kumar, and A. Tomkins. Microscopic evolution of social networks. In proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, [18] J. Leskovec, J. Kleinberg, and C. Faloutsos. Graphs over time: Densification law, shrinking diameters and possible explanations. In Proc. of 11th ACM SIGKDD international conference on knowledge discovery and data mining., pages , [19] D. Liben-Nowell and J. Kleinberg. The link prediction problem for social networks. In Twelfth International Conference on Information and Knowledge Management, pages ACM, November [20] Y.-R. Lin, H. Sundaram, Y. Chi, J. Tatemura, and B. L. Tseng. Blog community discovery and evolution based on mutual awareness expansion. In WI 07: Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, pages 48 56, Washington, DC, USA, IEEE Computer Society. [21] R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon. Network motifs: Simple building blocks of complex networks. Science, 298(5594): , Oct [22] G. Mishne and N. Glance. Leave a reply: An analysis of weblog comments. In Third annual workshop on the Weblogging ecosystem, Edinburgh, Scotland, May [23] M. J. Newman. The structure and function of complex networks, [24] M. J. Newman. Power laws, pareto distributions and zipf s law. Contemporary Physics, 46: , [25] V. Qazvinian, A. Rassoulian, and M. Shafiei. A large-scale study on persian weblogs. In the proceedings of 12th international joint conference on Artificial Intelligence, workshop of TextLink2007, [26] M. Thelwall. Bloggers during the london attacks: Top information sources and topics. In Proceedings of the WWW06 Workshop on Web Intelligence, [27] E. M. Trevino. Blogger motivations: Power, pull, and positive feedback. In Internet Research 6.0, [28] D. J. Watts and S. Strogatz. Collective dynamics of smallworld networks. Nature, 393: , June [29] S. Wernicke and F. Rasche. Fanmod: a tool for fast network motif detection. Bioinformatics, 22(9): , [30] J. Yang, L. A. Adamic, and M. S. Ackerman. Competing to share expertise: the taskcn knowledge sharing community. In proceedings of international conference on weblogs and social media, (ICWSM2008),

SOCIAL MEDIA MINING. Behavior Analytics

SOCIAL MEDIA MINING. Behavior Analytics SOCIAL MEDIA MINING Behavior Analytics Dear instructors/users of these slides: Please feel free to include these slides in your own material, or modify them as you see fit. If you decide to incorporate

More information

Co-evolution of Social and Affiliation Networks

Co-evolution of Social and Affiliation Networks Co-evolution of Social and Affiliation Networks Elena Zheleva Dept. of Computer Science University of Maryland College Park, MD 20742, USA elena@cs.umd.edu Hossam Sharara Dept. of Computer Science University

More information

Final Project - Social and Information Network Analysis

Final Project - Social and Information Network Analysis Final Project - Social and Information Network Analysis Factors and Variables Affecting Social Media Reviews I. Introduction Humberto Moreira Rajesh Balwani Subramanyan V Dronamraju Dec 11, 2011 Problem

More information

Final Report: Local Structure and Evolution for Cascade Prediction

Final Report: Local Structure and Evolution for Cascade Prediction Final Report: Local Structure and Evolution for Cascade Prediction Jake Lussier (lussier1@stanford.edu), Jacob Bank (jbank@stanford.edu) ABSTRACT Information cascades in large social networks are complex

More information

Final Report: Local Structure and Evolution for Cascade Prediction

Final Report: Local Structure and Evolution for Cascade Prediction Final Report: Local Structure and Evolution for Cascade Prediction Jake Lussier (lussier1@stanford.edu), Jacob Bank (jbank@stanford.edu) December 10, 2011 Abstract Information cascades in large social

More information

Reaction Paper Regarding the Flow of Influence and Social Meaning Across Social Media Networks

Reaction Paper Regarding the Flow of Influence and Social Meaning Across Social Media Networks Reaction Paper Regarding the Flow of Influence and Social Meaning Across Social Media Networks Mahalia Miller Daniel Wiesenthal October 6, 2010 1 Introduction One topic of current interest is how language

More information

Social Interaction in the Flickr Social Network

Social Interaction in the Flickr Social Network Social Interaction in the Flickr Social Network Karthik Gopalakrishnan, Arun Pandey and Joydeep Chandra Department of Computer Science and Engineering Indian Institute of Technology Patna Patna, India

More information

A Study of Meme Propagation: Statistics, Rates, Authorities, and Spread

A Study of Meme Propagation: Statistics, Rates, Authorities, and Spread A Study of Meme Propagation: Statistics, Rates, Authorities, and Spread Onkar Dalal 496 Lomita Mall onkar@stanford.edu Deepa Mahajan 450 Serra Mall dmahajan@stanford.edu Meghana Vishvanath 496 Lomita Mall

More information

Final Report Evaluating Social Networks as a Medium of Propagation for Real-Time/Location-Based News

Final Report Evaluating Social Networks as a Medium of Propagation for Real-Time/Location-Based News Final Report Evaluating Social Networks as a Medium of Propagation for Real-Time/Location-Based News Mehmet Ozan Kabak, Group Number: 45 December 10, 2012 Abstract This work is concerned with modeling

More information

AN APPROACH TO APPROXIMATE DIFFUSION PROCESSES IN SOCIAL NETWORKS

AN APPROACH TO APPROXIMATE DIFFUSION PROCESSES IN SOCIAL NETWORKS Association for Information Systems AIS Electronic Library (AISeL) UK Academy for Information Systems Conference Proceedings 2010 UK Academy for Information Systems Spring 3-23-2010 AN APPROACH TO APPROXIMATE

More information

The Science of Social Media. Kristina Lerman USC Information Sciences Institute

The Science of Social Media. Kristina Lerman USC Information Sciences Institute The Science of Social Media Kristina Lerman USC Information Sciences Institute ML meetup, July 2011 What is a science? Explain observed phenomena Make verifiable predictions Help engineer systems with

More information

The effect of Product Ratings on Viral Marketing CS224W Project proposal

The effect of Product Ratings on Viral Marketing CS224W Project proposal The effect of Product Ratings on Viral Marketing CS224W Project proposal Stefan P. Hau-Riege, stefanhr@stanford.edu In network-based marketing, social influence is considered in order to optimize marketing

More information

Group #2 Project Final Report: Information Flows on Twitter

Group #2 Project Final Report: Information Flows on Twitter Group #2 Project Final Report: Information Flows on Twitter Huang-Wei Chang, Te-Yuan Huang 1 Introduction Twitter has become a very popular microblog website and had attracted millions of users up to 2009.

More information

Predicting ratings of peer-generated content with personalized metrics

Predicting ratings of peer-generated content with personalized metrics Predicting ratings of peer-generated content with personalized metrics Project report Tyler Casey tyler.casey09@gmail.com Marius Lazer mlazer@stanford.edu [Group #40] Ashish Mathew amathew9@stanford.edu

More information

Social Network Collaborative Filtering

Social Network Collaborative Filtering Social Network Collaborative Filtering Rong Zheng, Foster Provost, Anindya Ghose Abstract This paper reports on a preliminary empirical study comparing methods for collaborative filtering (CF) using explicit

More information

Inferring Social Ties across Heterogeneous Networks

Inferring Social Ties across Heterogeneous Networks Inferring Social Ties across Heterogeneous Networks CS 6001 Complex Network Structures HARISH ANANDAN Introduction Social Ties Information carrying connections between people It can be: Strong, weak or

More information

Analyzing the Influential People in Sina Weibo Dataset

Analyzing the Influential People in Sina Weibo Dataset Analyzing the Influential People in Sina Weibo Dataset Qing Liao, Wei Wang, Yi Han, Qian Zhang Department of Computer Science and Engineering Hong Kong University of Science and Technology, Hong Kong {qnature,

More information

Networked Life (CSE 112)

Networked Life (CSE 112) Networked Life (CSE 112) Prof. Michael Kearns Final Examination May 3, 2006 The final exam is closed-book; you should have no materials present other than the exam and a pen or pencil. NAME: PENN ID: Exam

More information

Incentivized Sharing in Social Networks

Incentivized Sharing in Social Networks Incentivized Sharing in Social Networks Joseph J. Pfeiffer III Department of Computer Science Purdue University, West Lafayette, IN, USA jpfeiffer@purdue.edu Elena Zheleva LivingSocial Washington, D.C.,

More information

Predicting Yelp Ratings From Business and User Characteristics

Predicting Yelp Ratings From Business and User Characteristics Predicting Yelp Ratings From Business and User Characteristics Jeff Han Justin Kuang Derek Lim Stanford University jeffhan@stanford.edu kuangj@stanford.edu limderek@stanford.edu I. Abstract With online

More information

The Emergence of Hypertextual Ecology from Individual Decisions

The Emergence of Hypertextual Ecology from Individual Decisions The Emergence of Hypertextual Ecology from Individual Decisions Miles Efron Steven M. Goodreau Vishal Sanwalani July 23, 2002 Abstract Current World Wide Web (WWW) search engines employ graph-theoretic

More information

Analysis of Data and Relations in Social Networks LU LIU 10/03/2016

Analysis of Data and Relations in Social Networks LU LIU 10/03/2016 Analysis of Data and Relations in Social Networks LU LIU 10/03/2016 1 What is Twitter, a Social Network or a News Media? Authors: Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon Note: All pictures

More information

Worker Skill Estimation from Crowdsourced Mutual Assessments

Worker Skill Estimation from Crowdsourced Mutual Assessments Worker Skill Estimation from Crowdsourced Mutual Assessments Shuwei Qiang The George Washington University Amrinder Arora BizMerlin Current approaches for estimating skill levels of workforce either do

More information

Reaction Paper Influence Maximization in Social Networks: A Competitive Perspective

Reaction Paper Influence Maximization in Social Networks: A Competitive Perspective Reaction Paper Influence Maximization in Social Networks: A Competitive Perspective Siddhartha Nambiar October 3 rd, 2013 1 Introduction Social Network Analysis has today fast developed into one of the

More information

From Individual Behavior to Influence Networks: A Case Study on Twitter

From Individual Behavior to Influence Networks: A Case Study on Twitter From Individual Behavior to Influence Networks: A Case Study on Twitter Arlei Silva, Hérico Valiati, Sara Guimarães, Wagner Meira Jr. Universidade Federal de Minas Gerais Computer Science Department Belo

More information

A STUDY ON SNA: MEASURE AVERAGE DEGREE AND AVERAGE WEIGHTED DEGREE OF KNOWLEDGE DIFFUSION IN GEPHI

A STUDY ON SNA: MEASURE AVERAGE DEGREE AND AVERAGE WEIGHTED DEGREE OF KNOWLEDGE DIFFUSION IN GEPHI A STUDY ON SNA: MEASURE AVERAGE DEGREE AND AVERAGE WEIGHTED DEGREE OF KNOWLEDGE DIFFUSION IN GEPHI Ayyappan.G 1, Dr.C.Nalini 2, Dr.A.Kumaravel 3 Research Scholar, Department of CSE, Bharath University,Chennai

More information

Modeling Heterogeneous User. Churn and Local Resilience of Unstructured P2P Networks

Modeling Heterogeneous User. Churn and Local Resilience of Unstructured P2P Networks Modeling Heterogeneous User Churn and Local Resilience of Unstructured P2P Networks Zhongmei Yao Joint work with Derek Leonard, Xiaoming Wang, and Dmitri Loguinov Internet Research Lab Department of Computer

More information

Measuring User Activity on an Online Location-based Social Network

Measuring User Activity on an Online Location-based Social Network Measuring User Activity on an Online Location-based Social Network Salvatore Scellato Computer Laboratory, University of Cambridge salvatore.scellato@cl.cam.ac.uk Cecilia Mascolo Computer Laboratory, University

More information

Modeling the Temporal Dynamics of Social Rating Networks using Bidirectional Effects of Social Relations and Rating Patterns

Modeling the Temporal Dynamics of Social Rating Networks using Bidirectional Effects of Social Relations and Rating Patterns Modeling the Temporal Dynamics of Social Rating Networks using Bidirectional Effects of Social Relations and Rating Patterns Mohsen Jamali School of Computing Science Simon Fraser University Burnaby, BC,

More information

Evolution of Social-Attribute Networks: Measurements, Modeling, and Implications using Google+

Evolution of Social-Attribute Networks: Measurements, Modeling, and Implications using Google+ Evolution of Social-Attribute Networks: Measurements, Modeling, and Implications using Google+ Neil Zhenqiang Gong EECS, UC Berkeley neilz.gong@berkeley.edu Prateek Mittal EECS, UC Berkeley pmittal@eecs.berkeley.edu

More information

arxiv: v1 [cs.si] 18 Dec 2010

arxiv: v1 [cs.si] 18 Dec 2010 COMS6998 - Network Theory - HW3 Motif Analysis in the Amazon Product Co-Purchasing Network Abhishek Srivastava Computer Science Department, Columbia University (Dated: December 21, 2010) Online stores

More information

Survival Outcome Prediction for Cancer Patients based on Gene Interaction Network Analysis and Expression Profile Classification

Survival Outcome Prediction for Cancer Patients based on Gene Interaction Network Analysis and Expression Profile Classification Survival Outcome Prediction for Cancer Patients based on Gene Interaction Network Analysis and Expression Profile Classification Final Project Report Alexander Herrmann Advised by Dr. Andrew Gentles December

More information

Influence of First Steps in a Community on Ego-Network: Growth, Diversity, and Engagement

Influence of First Steps in a Community on Ego-Network: Growth, Diversity, and Engagement Influence of First Steps in a Community on Ego-Network: Growth, Diversity, and Engagement Atef Chaudhury University of Waterloo Waterloo, ON N2L 3G1, Canada a29chaud@uwaterloo.ca Myunghwan Kim LinkedIn

More information

Jure Leskovec, Includes joint work with Jaewon Yang, Manuel Gomez Rodriguez, Andreas Krause, Lars Backstrom and Jon Kleinberg.

Jure Leskovec, Includes joint work with Jaewon Yang, Manuel Gomez Rodriguez, Andreas Krause, Lars Backstrom and Jon Kleinberg. Jure Leskovec, Stanford University Includes joint work with Jaewon Yang, Manuel Gomez Rodriguez, Andreas Krause, Lars Backstrom and Jon Kleinberg. Information reaches us by personal influence in our social

More information

Diffusive Logistic Model Towards Predicting Information Diffusion in Online Social Networks

Diffusive Logistic Model Towards Predicting Information Diffusion in Online Social Networks Diffusive Logistic Model Towards Predicting Information Diffusion in Online Social Networks Feng Wang, Haiyan Wang, Kuai Xu Arizona State University Email: {fwang5, haiyan.wang, kuai.xu}@asu.edu Abstract

More information

Insights from the Wikipedia Contest

Insights from the Wikipedia Contest Insights from the Wikipedia Contest Kalpit V Desai, Roopesh Ranjan Abstract The Wikimedia Foundation has recently observed that newly joining editors on Wikipedia are increasingly failing to integrate

More information

Eyal Carmi. Google, 76 Ninth Avenue, New York, NY U.S.A. Gal Oestreicher-Singer and Uriel Stettner

Eyal Carmi. Google, 76 Ninth Avenue, New York, NY U.S.A. Gal Oestreicher-Singer and Uriel Stettner RESEARCH NOTE IS OPRAH CONTAGIOUS? THE DEPTH OF DIFFUSION OF DEMAND SHOCKS IN A PRODUCT NETWORK Eyal Carmi Google, 76 Ninth Avenue, New York, NY 10011 U.S.A. {eyal.carmi@gmail.com} Gal Oestreicher-Singer

More information

CSE 255 Lecture 14. Data Mining and Predictive Analytics. Hubs and Authorities; PageRank

CSE 255 Lecture 14. Data Mining and Predictive Analytics. Hubs and Authorities; PageRank CSE 255 Lecture 14 Data Mining and Predictive Analytics Hubs and Authorities; PageRank Trust in networks We already know that there s considerable variation in the connectivity structure of nodes in networks

More information

Characterizing User Interactions in Flickr Social Network

Characterizing User Interactions in Flickr Social Network Characterizing User Interactions in Flickr Social Network Masoud Valafar DRP Report Abstract Online Social Networking (OSN) services have become among the most popular services on Internet and their growth

More information

An Adaptive Pricing Scheme for Content Delivery Systems

An Adaptive Pricing Scheme for Content Delivery Systems An Adaptive Pricing Scheme for Content Delivery Systems Srinivasan Jagannathan & Kevin C. Almeroth Department of Computer Science University of California Santa Barbara, CA 936-5 fjsrini,almerothg@cs.ucsb.edu

More information

A Survey on Influence Analysis in Social Networks

A Survey on Influence Analysis in Social Networks A Survey on Influence Analysis in Social Networks Jessie Yin 1 Introduction With growing popularity of Web 2.0, recent years have witnessed wide-spread adoption of rich social media applications such as

More information

The use of infection models in accounting and crediting

The use of infection models in accounting and crediting Challenges for Analysis of the Economy, the Businesses, and Social Progress Péter Kovács, Katalin Szép, Tamás Katona (editors) - Reviewed Articles The use of infection models in accounting and crediting

More information

The Intersection of Social and Technological Networks

The Intersection of Social and Technological Networks The Emerging Intersection of Social and Technological Networks Cornell University Networks as Phenomena The emergence of cyberspace and the World Wide Web is like the discovery of a new continent. Jim

More information

Can Cascades be Predicted?

Can Cascades be Predicted? Can Cascades be Predicted? Rediet Abebe and Thibaut Horel September 22, 2014 1 Introduction In this presentation, we discuss the paper Can Cascades be Predicted? by Cheng, Adamic, Dow, Kleinberg, and Leskovec,

More information

How Placing Limitations on the Size of Personal Networks Changes the Structural Properties of Complex Networks

How Placing Limitations on the Size of Personal Networks Changes the Structural Properties of Complex Networks How Placing Limitations on the Size of Personal Networks Changes the Structural Properties of Complex Networks Somayeh Koohborfardhaghighi Technology Management, Economics, and Policy Program College of

More information

STATISTICAL TECHNIQUES. Data Analysis and Modelling

STATISTICAL TECHNIQUES. Data Analysis and Modelling STATISTICAL TECHNIQUES Data Analysis and Modelling DATA ANALYSIS & MODELLING Data collection and presentation Many of us probably some of the methods involved in collecting raw data. Once the data has

More information

Beer Hipsters: Exploring User Mindsets in Online Beer Reviews

Beer Hipsters: Exploring User Mindsets in Online Beer Reviews Beer Hipsters: Exploring User Mindsets in Online Beer Reviews Group 27: Yuze Dan Huang, Aaron Lewis, Garrett Schlesinger December 10, 2012 1 Introduction 1.1 Motivation Online product rating systems are

More information

Homophily and Influence in Social Networks

Homophily and Influence in Social Networks Homophily and Influence in Social Networks Nicola Barbieri nicolabarbieri1@gmail.com References: Maximizing the Spread of Influence through a Social Network, Kempe et Al 2003 Influence and Correlation

More information

The Role of Social Ties in Dynamic Networks

The Role of Social Ties in Dynamic Networks University of South Florida Scholar Commons Graduate Theses and Dissertations Graduate School 4-7-2016 The Role of Social Ties in Dynamic Networks Xiang Zuo Follow this and additional works at: http://scholarcommons.usf.edu/etd

More information

Socializing the h-index

Socializing the h-index Graham Cormode * Socializing the h-index Qiang Ma S. Muthukrishnan Brian Thompson 5/7/2013 Abstract A variety of bibliometric measures have been proposed to quantify the impact of researchers and their

More information

Structure of an On-Line Community: Bands in MySpace

Structure of an On-Line Community: Bands in MySpace Structure of an On-Line Community: Bands in MySpace Dietmar Offenhuber 1 & Whitman Richards 2 {dietmar, whit}@media.mit.edu (1) Media Lab; (2) CSAIL Mass. Inst. of Technology E15-390, Cambridge, MA 02139

More information

Cascading Behavior in Networks. Anand Swaminathan, Liangzhe Chen CS /23/2013

Cascading Behavior in Networks. Anand Swaminathan, Liangzhe Chen CS /23/2013 Cascading Behavior in Networks Anand Swaminathan, Liangzhe Chen CS 6604 10/23/2013 Outline l Diffusion in networks l Modeling diffusion through a network l Diffusion, Thresholds and role of Weak Ties l

More information

Tutorial at the ACM SIGKDD conference, Jure Leskovec Stanford University

Tutorial at the ACM SIGKDD conference, Jure Leskovec Stanford University Tutorial at the ACM SIGKDD conference, 2011 http://snap.stanford.edu/proj/socmedia-kdd Jure Leskovec Stanford University 8/21/2011 Jure Leskovec: Social Media Analytics (KDD '11 tutorial) 2 In Social Media

More information

Study of the Statistical Significance of Large(r) Network Motifs

Study of the Statistical Significance of Large(r) Network Motifs Study of the Statistical Significance of Large(r) Network Motifs Dhinakaran Dhanaraj 1 2012 National Science Foundation BioGRID REU Fellows Department of Computer Science & Engineering, University of Connecticut,

More information

Business Network Analytics

Business Network Analytics Business Network Analytics Sep, 2017 Daning Hu Department of Informatics University of Zurich Business Intelligence Research Group F Schweitzer et al. Science 2009 Research Methods and Goals What Why How

More information

Me Too 2.0: An Analysis of Viral Retweets on the Twittersphere

Me Too 2.0: An Analysis of Viral Retweets on the Twittersphere Me Too 2.0: An Analysis of Viral Retweets on the Twittersphere Rio Akasaka Department of Computer Science rio@cs.stanford.edu Patrick Grafe Department of Computer Science pgrafe@stanford.edu Makoto Kondo

More information

The Impact of Rumor Transmission on Product Pricing in BBV Weighted Networks

The Impact of Rumor Transmission on Product Pricing in BBV Weighted Networks Management Science and Engineering Vol. 11, No. 3, 2017, pp. 55-62 DOI:10.3968/9952 ISSN 1913-0341 [Print] ISSN 1913-035X [Online] www.cscanada.net www.cscanada.org The Impact of Rumor Transmission on

More information

NETWORK DYNAMICS, PREFERENTIAL ATTACHMENT AND MARKET LIBERALISATION

NETWORK DYNAMICS, PREFERENTIAL ATTACHMENT AND MARKET LIBERALISATION Student Economic Review, Vol. 22, 2008 NETWORK DYNAMICS, PREFERENTIAL ATTACHMENT AND MARKET LIBERALISATION Senior Sophister Today s markets are constantly in flux, with firms forced to continually grow

More information

Measuring User Influence on Twitter Using Modified K-Shell Decomposition

Measuring User Influence on Twitter Using Modified K-Shell Decomposition Measuring User Influence on Twitter Using Modified K-Shell Decomposition Philip E. Brown Junlan Feng AT&T Labs - Research, United States philbrown@att.com junlan@research.att.com Abstract Social influence

More information

Analysis of User s Relation and Reading Activity in Weblogs

Analysis of User s Relation and Reading Activity in Weblogs Analysis of User s Relation and Reading Activity in Weblogs Tadanobu Furukawa 1, Tomofumi Matsuzawa 2, Yutaka Matsuo 3, Koki Uchiyama 4 and Masayuki Takeda 2 1 Graduate School of Science and Technology,

More information

Uncovering the Small Community Structure in Large Networks: A Local Spectral Approach

Uncovering the Small Community Structure in Large Networks: A Local Spectral Approach Uncovering the Small Community Structure in Large Networks: A Local Spectral Approach Yixuan Li 1, Kun He 2, David Bindel 1 and John E. Hopcroft 1 1 Cornell University, USA 2 Huazhong University of Science

More information

Online Social Networks

Online Social Networks Online Social Networks Daniel Huttenlocher John P. and Rilla Neafsey Professor of Computing, Information Science and Business Social Network Models [Zachary77] Individuals and relationships between them

More information

COORDINATING DEMAND FORECASTING AND OPERATIONAL DECISION-MAKING WITH ASYMMETRIC COSTS: THE TREND CASE

COORDINATING DEMAND FORECASTING AND OPERATIONAL DECISION-MAKING WITH ASYMMETRIC COSTS: THE TREND CASE COORDINATING DEMAND FORECASTING AND OPERATIONAL DECISION-MAKING WITH ASYMMETRIC COSTS: THE TREND CASE ABSTRACT Robert M. Saltzman, San Francisco State University This article presents two methods for coordinating

More information

Generative Models for Networks and Applications to E-Commerce

Generative Models for Networks and Applications to E-Commerce Generative Models for Networks and Applications to E-Commerce Patrick J. Wolfe (with David C. Parkes and R. Kang-Xing Jin) Division of Engineering and Applied Sciences Department of Statistics Harvard

More information

On the Interplay between Social and Topical Structure. Daniel M. Romero, Chenhao Tan, Johan Ugander Northwestern University & Cornell University

On the Interplay between Social and Topical Structure. Daniel M. Romero, Chenhao Tan, Johan Ugander Northwestern University & Cornell University On the Interplay between Social and Topical Structure Daniel M. Romero, Chenhao Tan, Johan Ugander Northwestern University & Cornell University Your social relationships and your topics of interests are

More information

On the Evolution of User Interaction in Facebook

On the Evolution of User Interaction in Facebook On the Evolution of User Interaction in Facebook Bimal Viswanath Alan Mislove Meeyoung Cha Krishna P. Gummadi Max Planck Institute for Software Systems (MPI-SWS) Rice University Kaiserslautern/Saarbrücken,

More information

Targeting Individuals to Catalyze Collective Action in Social Networks

Targeting Individuals to Catalyze Collective Action in Social Networks Targeting Individuals to Catalyze Collective Action in Social Networks Marco A. Janssen 1 1 Center for the Study of Institutional Diversity School of Human Evolution and Social Change Arizona State University

More information

Finding Your Food Soulmate

Finding Your Food Soulmate CS 224W Information Networks Angela Sy, Lawrence Liu, Blanca Villanueva Dec. 9, 2014 Project Final Report Finding Your Food Soulmate 1. Abstract Individuals in social networks are often unaware of people

More information

Legislation as a complex network: Modelling and analysis of European Union legal sources

Legislation as a complex network: Modelling and analysis of European Union legal sources Legislation as a complex networ: Modelling and analysis of European Union legal sources Marios KONIARIS a,1, Ioannis ANAGNOSTOPOULOS b and Yannis VASSILIOU a a KDBSL Lab, School of ECE, Nat. Tech. Univ.

More information

Increasing Wireless Revenue with Service Differentiation

Increasing Wireless Revenue with Service Differentiation Increasing Wireless Revenue with Service Differentiation SIAMAK AYANI and JEAN WALRAND Department of Electrical Engineering and Computer Sciences University of California at Berkeley, Berkeley, CA 94720,

More information

Measuring Coverage and Divergence of Reading Behaviors Among Friends

Measuring Coverage and Divergence of Reading Behaviors Among Friends Measuring Coverage and Divergence of Reading Behaviors Among Friends Long T. Le Rutgers University longtle@cs.rutgers.edu Tina Eliassi-Rad Rutgers University eliassi@cs.rutgers.edu ABSTRACT Given data

More information

Evaluating Workflow Trust using Hidden Markov Modeling and Provenance Data

Evaluating Workflow Trust using Hidden Markov Modeling and Provenance Data Evaluating Workflow Trust using Hidden Markov Modeling and Provenance Data Mahsa Naseri and Simone A. Ludwig Abstract In service-oriented environments, services with different functionalities are combined

More information

Estimation of social network user s influence in a given area of expertise

Estimation of social network user s influence in a given area of expertise Journal of Physics: Conference Series PAPER OPEN ACCESS Estimation of social network user s influence in a given area of expertise To cite this article: E E Luneva et al 2017 J. Phys.: Conf. Ser. 803 012089

More information

Privacy-preserving Datamining: Differential Privacy And Applications

Privacy-preserving Datamining: Differential Privacy And Applications Privacy-preserving Datamining: Differential Privacy And Applications Christine Task PhD Candidate Computer Science Department Purdue University Advisor: Chris Clifton 1 In The Era of Big Data... 2 Presentation

More information

Disease Protein Pathway Discovery using Higher-Order Network Structures

Disease Protein Pathway Discovery using Higher-Order Network Structures Disease Protein Pathway Discovery using Higher-Order Network Structures Kevin Q Li, Glenn Yu, Jeffrey Zhang 1 Introduction A disease pathway is a set of proteins that influence the disease. The discovery

More information

Threshold model of diffusion: An agent based simulation and a social network approach

Threshold model of diffusion: An agent based simulation and a social network approach Threshold model of diffusion: An agent based simulation and a social network approach Suk-ho Kang, Wonchang Hur. Jeehong Kim, and Daeyoung Kim Abstract Innovation diffusion is a social process in which

More information

BOOK REVIEW By Sandeep Krishnamurthy

BOOK REVIEW By Sandeep Krishnamurthy BOOK REVIEW By Sandeep Krishnamurthy 1. Huberman, Bernardo (2003), "The Laws of the Web: Patterns in the Ecology of Information", MIT Press. 2. Watts, Duncan J (2003), "Six Degrees: The Science of a Connected

More information

Degrees of separation on a dynamic social network

Degrees of separation on a dynamic social network Degrees of separation on a dynamic social network André Domingos ISEL, Poly Inst of Lisbon A24503@alunos.isel.pt Cátia Vaz ISEL, Poly Inst of Lisbon INESC-ID at Lisbon cvaz@cc.isel.pt Hugo Ferreira ISEL,

More information

Centrality Measures, Upper Bound, and Influence Maximization in Large Scale Directed Social Networks

Centrality Measures, Upper Bound, and Influence Maximization in Large Scale Directed Social Networks Fundamenta Informaticae 130 (2014) 317 342 317 DOI 10.3233/FI-2014-994 IOS Press Centrality Measures, Upper Bound, and Influence Maximization in Large Scale Directed Social Networks Sankar K. Pal, Suman

More information

Business Analytics & Data Mining Modeling Using R Dr. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee

Business Analytics & Data Mining Modeling Using R Dr. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee Business Analytics & Data Mining Modeling Using R Dr. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee Lecture - 02 Data Mining Process Welcome to the lecture 2 of

More information

Measurement of Bloggers Buzzword Prediction Ability Based on Analyzing Frequency of Early Mentions of Past Buzzwords

Measurement of Bloggers Buzzword Prediction Ability Based on Analyzing Frequency of Early Mentions of Past Buzzwords , March 12-14, 2014, Hong Kong Measurement of Bloggers Buzzword Prediction Ability Based on Analyzing Frequency of Early Mentions of Past Buzzwords Seiya Tomonaga, Shinsuke Nakajima, Yoichi Inagaki, Reyn

More information

Link Prediction in Bipartite Venture Capital Investment Networks

Link Prediction in Bipartite Venture Capital Investment Networks Link Prediction in Bipartite Venture Capital Investment Networks Charles Zhang Statistics Stanford University cyzhang@stanford.edu Ethan Chan Computer Science Stanford University ethancys@cs.stanford.edu

More information

Influence and Correlation in Social Networks

Influence and Correlation in Social Networks Influence and Correlation in Social Networks Aris Anagnostopoulos Department of Informatics and System Sciences Sapienza University of Rome Based on joint work with Ravi Kumar and Mohammad Mahdian Yahoo!

More information

Predicting Popularity of Messages in Twitter using a Feature-weighted Model

Predicting Popularity of Messages in Twitter using a Feature-weighted Model International Journal of Advanced Intelligence Volume 0, Number 0, pp.xxx-yyy, November, 20XX. c AIA International Advanced Information Institute Predicting Popularity of Messages in Twitter using a Feature-weighted

More information

Link prediction in the Twitter mention network: impacts of local structure and similarity of interest

Link prediction in the Twitter mention network: impacts of local structure and similarity of interest 2016 IEEE 16th International Conference on Data Mining Workshops Link prediction in the Twitter mention network: impacts of local structure and similarity of interest Hadrien Hours, Eric Fleury and Márton

More information

Cumulative Effect in Information Diffusion: Empirical Study on a Microblogging Network

Cumulative Effect in Information Diffusion: Empirical Study on a Microblogging Network Cumulative Effect in Information Diffusion: Empirical Study on a Microblogging Network Peng Bao, Hua-Wei Shen*, Wei Chen, Xue-Qi Cheng Institute of Computing Technology, Chinese Academy of Sciences, Beijing,

More information

Bilateral and Multilateral Exchanges for Peer-Assisted Content Distribution

Bilateral and Multilateral Exchanges for Peer-Assisted Content Distribution 1290 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 19, NO. 5, OCTOBER 2011 Bilateral and Multilateral Exchanges for Peer-Assisted Content Distribution Christina Aperjis, Ramesh Johari, Member, IEEE, and Michael

More information

SI Networks: Theory and Application, Fall 2008

SI Networks: Theory and Application, Fall 2008 University of Michigan Deep Blue deepblue.lib.umich.edu 2008-09 SI 508 - Networks: Theory and Application, Fall 2008 Adamic, Lada Adamic, L. (2008, November 12). Networks: Theory and Application. Retrieved

More information

Exploring Tag-based Like Networks

Exploring Tag-based Like Networks Exploring Tag-based Like Networks Kyungsik Han College of Information Sciences and Technology The Pennsylvania State University kuh178@psu.edu Jin Yea Jang College of Information Sciences and Technology

More information

Influence of Network Structure on Market Share in Complex Market Structures

Influence of Network Structure on Market Share in Complex Market Structures Influence of Network tructure on Market hare in Complex Market tructures Makoto Uchida 1 and usumu hirayama 2 1 chool of Engineering, the University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba, 277-8568,

More information

Towards Effective and Efficient Behavior-based Trust Models. Klemens Böhm Universität Karlsruhe (TH)

Towards Effective and Efficient Behavior-based Trust Models. Klemens Böhm Universität Karlsruhe (TH) Towards Effective and Efficient Behavior-based Trust Models Universität Karlsruhe (TH) Motivation: Grid Computing in Particle Physics Physicists have designed and implemented services specific to particle

More information

Revealing Hidden Connections in Recommendation Networks

Revealing Hidden Connections in Recommendation Networks Revealing Hidden Connections in Recommendation Networks Rogério Minhano Universidade Federal do ABC - UFABC, Rua Santa Adélia., 66, Santo André, Brazil discover.rogerio@tokiomarine.com.br Stenio Fernandes

More information

Computer Networks 56 (2012) Contents lists available at SciVerse ScienceDirect. Computer Networks

Computer Networks 56 (2012) Contents lists available at SciVerse ScienceDirect. Computer Networks Computer Networks 56 (212) 166 176 Contents lists available at SciVerse ScienceDirect Computer Networks journal homepage: www.elsevier.com/locate/comnet Delayed information cascades in Flickr: Measurement,

More information

DYNAMICS OF INNOVATION DIFFUSION WITH TWO STEP DECISION PROCESS

DYNAMICS OF INNOVATION DIFFUSION WITH TWO STEP DECISION PROCESS F O U N D A T I O N S O F C O M P U T I N G A N D D E C I S I O N S C I E N C E S Vol. 39 (2014) No. 1 DOI: 10.2478/fcds-2014-0004 ISSN 0867-6356 e-issn 2300-3405 DYNAMICS OF INNOVATION DIFFUSION WITH

More information

Linear model to forecast sales from past data of Rossmann drug Store

Linear model to forecast sales from past data of Rossmann drug Store Abstract Linear model to forecast sales from past data of Rossmann drug Store Group id: G3 Recent years, the explosive growth in data results in the need to develop new tools to process data into knowledge

More information

What is the best firm size to invest? Ivan O. Kitov Institute for the Geospheres Dynamics, Russian Academy of Sciences,

What is the best firm size to invest? Ivan O. Kitov Institute for the Geospheres Dynamics, Russian Academy of Sciences, What is the best firm size to invest? Ivan O. Kitov Institute for the Geospheres Dynamics, Russian Academy of Sciences, ikitov@mail.ru Abstract Significant differences in the evolution of firm size distribution

More information

ECS 253 / MAE 253, Lecture 13 May 10, I. Games on networks II. Diffusion, Cascades and Influence

ECS 253 / MAE 253, Lecture 13 May 10, I. Games on networks II. Diffusion, Cascades and Influence ECS 253 / MAE 253, Lecture 13 May 10, 2016 I. Games on networks II. Diffusion, Cascades and Influence Summary of spatial flows and games Optimal location of facilities to maximize access for all. Designing

More information

Modules to Whole Food Webs

Modules to Whole Food Webs Modules to Whole Food Webs 1 Even Simplified Whole Food Webs are extremely complex. How is this complexity maintained from year to year? 2 Stability of Food Web Modules Before tackling food webs of hundreds

More information

Preference Elicitation for Group Decisions

Preference Elicitation for Group Decisions Preference Elicitation for Group Decisions Lihi Naamani-Dery 1, Inon Golan 2, Meir Kalech 2, and Lior Rokach 1 1 Telekom Innovation Laboratories at Ben-Gurion University, Israel 2 Ben Gurion University,

More information

Application of Decision Trees in Mining High-Value Credit Card Customers

Application of Decision Trees in Mining High-Value Credit Card Customers Application of Decision Trees in Mining High-Value Credit Card Customers Jian Wang Bo Yuan Wenhuang Liu Graduate School at Shenzhen, Tsinghua University, Shenzhen 8, P.R. China E-mail: gregret24@gmail.com,

More information