Social Professional Networks: Taxonomy, Metrics and Analyses of Relationship Strength
|
|
- Elijah Perry
- 5 years ago
- Views:
Transcription
1 Social Professional Networks: Taxonomy, Metrics and Analyses of Relationship Strength
2 Authors Mirella M. Moro is associate professor at the Computer Science department at UFMG (Belo Horizonte, Brazil) Michele A. Brandão is currently at UFMG as staff ( professor substituto ) and posdoctoral resident 2
3 1. Introduction Context Social networks relationships and interactions among individuals Their models and patterns allow to solve different problems 3
4 4 Source:
5 5 Source:
6 Everybody knows What an online social network is The most used ones Who uses them (next slide) How to use them How to profit from them (?) Who are the most influencing people (?) How it evolves (?) 6 Source:
7 Three-quarters of Facebook users and half of Instagram users use each site daily Among the users of each social networking site, % who use these sites Source: 7 Source:
8 Number of social network users in selected countries in 2017 and 2022 (in millions) 8 Source:
9 Social Networks Value The people with whom we interact on a regular basis, and even some with whom we interact only sporadically, influence our beliefs, decisions and behaviors Matthew O. Jackson: An Overview of Social Networks and Economic Applications. In: Handbook of Social Economics. Edited by Jess Benhabib, Alberto Bisin and Matthew O. Jackson. Elsevier,
10 Social Networks Value Examples of the effects of social networks on economic activity are abundant and pervasive, including roles in transmitting information about jobs, new products, technologies, and political opinions Matthew O. Jackson: An Overview of Social Networks and Economic Applications. In: Handbook of Social Economics. Edited by Jess Benhabib, Alberto Bisin and Matthew O. Jackson. Elsevier,
11 Social Networks Value Networks of relationships among various firms and political organizations affect research and development, patent activity, trade patterns, and political alliances Matthew O. Jackson: An Overview of Social Networks and Economic Applications. In: Handbook of Social Economics. Edited by Jess Benhabib, Alberto Bisin and Matthew O. Jackson. Elsevier,
12 Social Networks Value Given the many roles of networks in economic activity, they have become increasingly studied by economists Matthew O. Jackson: An Overview of Social Networks and Economic Applications. In: Handbook of Social Economics. Edited by Jess Benhabib, Alberto Bisin and Matthew O. Jackson. Elsevier,
13 Example Brazil Mining over Twitter data Outbreaks detection Predict future outbreaks Plan combat actions properly 13 pixabay.com
14 Outline Introduction Background General taxonomy for social networks Tie strength over academic social networks Collaboration strength metrics and analyses on GitHub Tie strength application Final thoughts and future work 14
15 How to Model Social Networks? 15
16 Social Networks Models Homogeneous 16
17 Social Networks Models Bipartite Heterogeneous 17
18 Social Networks Models Multipartite Heterogeneous 18
19 Social Networks Models Multigraphs Heterogeneous 19
20 Social Networks Models Multilayer Heterogeneous 20
21 Ruby Developer Network Heterogeneous network GitHub Example JavaScript Developer Network Relationship based on contributing to the same repository User X follows user Y User X opened an issue assigned to Y Users commented in the same issue User X watches a repository owned by Y 21 Python Developer Network
22 Why use social network data? 22
23 23 Source:
24 + Geographic information 24 Source:
25 R. Caldelli, R. Becarelli and I. Amerini. Image Origin Classification Based on Social Network Provenance in IEEE Transactions on Information Forensics and Security, vol. 12, no. 6, pp ,
26 + Geographic information C.-K. Hsieh et al. Immersive Recommendation: News and Event Recommendations Using Personal Digital Traces. In Proceedings of the 25th International Conference on World Wide Web, 51-62,
27 What are the properties in social networks? 27
28 Nodes Edges Degree Betweenness Closeness Neighborhood overlap PageRank Interaction frequency Clustering coefficient Path length
29 We focus on tie strength 29
30 Outline Introduction Background General taxonomy for social networks Tie strength over academic social networks Collaboration strength metrics and analyses on GitHub Tie strength application Final thoughts and future work 30
31 3. General Taxonomy for SN M. A. Brandão & M. M. Moro. Social Professional Network: a Survey and Taxonomy. Journal of Computer Communications (COMCOM), v. 100, p. 20, 2017
32 Social Networks Research Topics Tasks Issues Crawling Storing Managing Treating Data from the networks The way the networks can be Analyzed Used Improved Applied 32
33 Issues Problems within social networks regarding their maintenance and usage Tasks Problems whose solutions benefit from using SN data 33
34 34
35 35
36 Social Networks 36
37 37
38 Social Professional Networks Motivation More than 20 websites for social profissional purpose Additional challenge emotional reasons Different studies 38
39 39
40 40
41 41
42 Outline Introduction Background General taxonomy for social networks Tie strength over academic social networks Collaboration strength metrics and analyses on GitHub Tie strength application Final thoughts and future work 42
43 4. Tie strength over academic social networks M. A. Brandão & M. M. Moro. Analyzing the Strength of Co-authorship Ties with Neighborhood Overlap. In DEXA 2015 M. A. Brandão & M. M. Moro. Neighborhood Overlap: Can This Metric Be Used to Characterize the Strength of Co-authorship Ties? In Grace Hopper Celebration, 2015 M. A. Brandão, M. A. Diniz & M. M. Moro. Using Topological Properties to Measure the Strength of Co-authorship Ties. In BraSNAM 2016 M. A. Brandão & M. M. Moro. The Strength of Co-authorship Ties through Different Topological Properties. JBCS, 23(1):5, 2017 M. A. Brandão, P. O. S. V. Melo & M. M. Moro. Tie Strength Dynamics over Temporal Co-authorship Social Networks. In: IEEE/WIC/ACM Web Intelligence, 2017
44 Context Current trend: social network analyses Central aspect: tie strength 44
45 1. How to measure tie strength? 45
46 "Tie strength may be measured by a combination of the amount of time, the cooperation intensity and the reciprocal services that characterize the tie " [Granovetter, 1973] 46
47 2. How to analyze tie strength? 47
48 Overview 48
49 Research Questions 1. How can we measure the strength of co-authorship ties? 2. Can we identify which aspects impact on the strength of co-authorship ties? 3. How is tie strength defined for temporal networks? 4. How much does the strength of ties vary over time? 49
50 Relevant knowledge Relevance Different applications Academic context Ranking Name disambiguation 50
51 Related Work 51
52 Related Work over Tie Strength Tie strength overview Granovetter [1973] Work ties [Castilho et al., 2017] Friendship ties [Seo et al., 2017; Zignani et al., 2016] Contact ties [Wiese et al., 2015] Developer ties [Alves et al., 2016] 52
53 Related Work over Tie Strength Temporal networks Temporal aspects has not been largely explored yet Kostakos [2009] and Nicosia et al. [2013] propose a set of network properties that consider the temporal aspect 53
54 Preliminary Study The Strength of Co-authorship Ties 54
55 Research Questions 1. How can we measure the strength of co-authorship ties? 2. Can we identify which aspects impact on the strength of co-authorship ties? 3. How is tie strength defined for temporal networks? 4. How much does the strength of ties vary over time? 55
56 Dataset Distribution of numbers of co-authors for researchers in each area 56
57 Dataset Distribution of numbers of co-authors for researchers in each area In sociology: From 7,195 publications, 83.96% have only one author 57
58 Neighborhood Overlap Characterizing Tie Strength 58
59 Correlation analyses Regression analyses Impact of Properties Property CS Med Soc Clustering coefficient SLC SLC SLC Edge betweenness EC EC EC Number of triangles SLC Eigenvector SLC Closeness EC Eccentricity EC SLC - Strong Linearly Correlated EC - Exponentially Correlated 59
60 Tie Strength Properties Neighborhood overlap NO Absolute frequency of interaction (edge weight) - W co-authorship frequency 60
61 Tie Strength over Non-temporal Social Network 61
62 Problems with NO and W Case 1: No common co-author 62
63 Problems with NO and W Case 2: No representation if the tie is inside a community or not 63
64 Problems with NO and W Case 3: Many common co-authors 64
65 Problems with NO and W Case 4: Results too small/high 65
66 Dataset Dataset Number of nodes Number of edges Period DBLP Articles 837,583 2,935, to 2015 DBLP Inproceedings 945,297 3,760, to 2015 PubMed 443,784 5,550, to 2016 APS 180, , to
67 Correlation Coefficients Between neighborhood overlap and co-authorship frequency Dataset Kendall Pearson Spearman DBLP Articles DBLP Inproceedings PubMed APS 67
68 Tieness Our new metric 2 68
69 Tieness Our new metric 2 69
70 Tieness Our new metric 2 70
71 Tieness Our new metric 2 71
72 Tieness Overall level of tieness in a SN 72
73 Tieness Real case Top 10 - José Palazzo M. de Oliveira Co-author coafrequency NO Tieness Leandro krug Wives Ana Marilza Pernas Stanley Loh Isabela Gasparini Daniel Lichtnow Marcelo Soares Pimenta Giseli Rabello Lopes Alencar Machado José Valdeni de Lima Mirella M. Moro
74 Tieness Real case NO Tieness Top 10 - José Palazzo M. de Oliveira Co-author coafrequency NO Tieness Leandro krug Wives Ana Marilza Pernas Stanley Loh Isabela Gasparini Daniel Lichtnow Marcelo Soares Pimenta Giseli Rabello Lopes Alencar Machado José Valdeni de Lima Mirella M. Moro
75 Tieness Real case How can we differentiate the two coauthors? Top 10 - José Palazzo M. de Oliveira Co-author coafrequency NO Tieness Leandro krug Wives Ana Marilza Pernas Stanley Loh Isabela Gasparini Daniel Lichtnow Marcelo Soares Pimenta Giseli Rabello Lopes Alencar Machado José Valdeni de Lima Mirella M. Moro
76 Tieness Real case Top 10 - José Palazzo M. de Oliveira Co-author coafrequency NO Tieness Leandro krug Wives Ana Marilza Pernas Stanley Loh Isabela Gasparini Daniel Lichtnow Marcelo Soares Pimenta Giseli Rabello Lopes Alencar Machado José Valdeni de Lima Mirella M. Moro
77 Tieness Nominal scale Granovetter's theory analysis agreement 77
78 Until now Introduction Background General taxonomy for social networks Tie strength over academic social networks Tie strength non-temporal networks 78
79 Tie Strength over Temporal Social Network 79
80 Time 80
81 A Justification for Using Temporal Aspect Most work focus only on strong and weak ties (e.g. Granovetter [1973] ) RECAST (Vaz de Melo et al. [2015]) is slow and do not consider the duration of the interactions Other approaches need social network data (e.g. historic of chat messages) 81
82 Background Static aggregated graph Dynamic graph t-1 t 82
83 Random graphs RECAST Random relationship ClASsifer strategy Two features Neighborhood overlap Edge persistence Small networks 83
84 fast-recast Build more than one random graph at a time Compute edge persistence and topological overlap in parallel Optimize memory use 84
85 fast-recast fast Random relationship ClASsifer strategy Class Properties Friends (strong) Frequent interactions and many common friends Acquaintances (weak) Infrequent interactions and many common friends Bridge Frequent interactions and not many common friends Random Infrequent interactions and not many common friends 85
86 fast-recast For PubMed dataset 86
87 fast-recast Relationship classes DBLP Articles DBLP Inproceedings PubMed APS 87
88 Our new algorithm 88
89 STACY Strength of Ties Automatic-Classifier over the Years Three features Neighborhood overlap (# of common friends) Edge persistence (interaction frequency) Co-authorship frequency Random graphs 89
90 STACY 90
91 STACY Relationship classes Class Common Friends Interaction intensity Class1 - strong high high high Class2 - bridge+ high low high low high high high high low Class5 - bursty low low high Class6 - bridge high low low Class7 - weak low high low Class8 - random low low low Class3 - transient Social Interaction frequency Class4 - periodic 91
92 STACY Relationship classes 1 - strong 2 - bridge+ 3 - transient 4 - periodic 5 - bursty 6 - bridge 7 - weak 8 - random DBLP Articles DBLP Inproceedings 8 92
93 STACY Relationship classes 1 - strong 2 - bridge+ 3 - transient 4 - periodic 5 - bursty 6 - bridge 7 - weak 8 - random PubMed APS
94 STACY Relationship classes 94
95 STACY Relationship classes 1 - strong 2 - bridge+ 3 - transient 4 - periodic 5 - bursty 6 - bridge 7 - weak 8 - random 95
96 Research Questions 1. How can we measure the strength of co-authorship ties? 2. Can we identify which aspects impact on the strength of co-authorship ties? 3. How is tie strength defined for temporal networks? 4. How much does the strength of ties vary over time? 96
97 Temporal Networks Tie Strength Concept Strong ties persistent Weak ties sporadic 97
98 Tie Strength Varying over Time 80%-20% Dataset Link persistence fast-recast STACY DBLP Articles DBLP Inproceedings PubMed APS STACY classifies strong ties better than fast-recast Strong ties and bridges persist more than others 98
99 Tie Strength Varying over Time 50%-50% - PubMed Class Periodic Bridge Disappear Strong Bridge Transient Periodic Bursty Bridge Weak Random Link transformation Most ties tend to disappear STACY reveals ties tend to change to class4 (periodic) and class6 (bridge) The paper provides more details 99
100 Deriving a new computational model 100
101 Temporal_tieness Low computational cost 101
102 Temporal_tieness Low computational cost Edge persistence 102
103 Temporal_tieness Low computational cost Neighborhood overlap (or topological overlap) 103
104 Temporal_tieness Low computational cost Edge weight 104
105 Measuring the strength of co-authorship ties Conclusion Aspects that impact on the strength of ties Tie strength for temporal networks Varying over time 105
106 Conclusion Methods to automatically detect relationship classes Differentiate random from social relationships 106
107 Outline Introduction Background General taxonomy for social networks Tie strength over academic social networks Collaboration strength metrics and analyses on GitHub Tie strength application Final thoughts and future work 107
108 5. Collaboration Strength Metrics and Analyses on GitHub N. A. Batista, M. A. Brandão, G. B. Alves, A. P. C. Silva & M. M. Moro. Collaboration Strength Metrics and Analyses on GitHub. In: 2017 IEEE/WIC/ACM Web Intelligence, 2017 N. A. Batista, G. B. Alves, A. Gonzaga & M. Brandão. GitSED: Um Conjunto de Dados com Informações Sociais baseado no GitHub. In SBBD DSW, 2017 G. B. Alves; M. A. Brandão; D. M. Santana; A. P. C. Silva; M. M. Moro. The Strength of Social Coding Collaboration on GitHub. In SBBD - Short papers,
109 Context Social Coding is a software development s approach that allows the collaboration among developers 109
110 Relevance Evaluating the strength of the developer's relationship can help to improve: Recommendation of developers to work in a project and fix bugs Productivity analysis of a development team 110
111 How to measure the strength of collaboration between developers on GitHub? 111
112 Related Work Casalnuovo et al. [2015] analyze the productivity of developers in projects Bartusiak et al. [2016] predict developers collaboration Tsay et al. [2014] investigate the acceptance of pull requests 112
113 Related Work Such studies measure the strength of interactions differently. However, none evaluates the best way to measure such strength and none investigates the correlation among such metrics 113
114 Contributions Analyze the strength of social collaboration measured by distinct metrics through different programming languages: insights on relationships patterns 1. GitSED - GitHub Socially Enhanced Dataset: curated (filtered on two programming languages), augmented (data not available on GHTorrent), and enriched (social network information) 2. Three new metrics for the strength of social coding collaboration: commits number of lines, potential of contribution, prior social interaction 3. Evaluate all metrics over two social networks for JavaScript and Ruby 114
115 Database GHTorrent: database from September, 2015 Not including: forked and deleted projects Two programming languages JavaScript Ruby 115
116 Database 116
117 Database 117
118 Database GitSED: GitHub Socially Enhanced Dataset 118
119 Collaboration Network Links Two developers contribute to the same repository Weights By topological and semantic metrics 119
120 Collaboration Network 120
121 Collaboration Network 121
122 Collaboration Network Weight attributed from proposed metrics 122
123 Topological Properties Clustering Coefficient is the tendency of the nodes to cluster Neighborhood Overlap computes the strength of the links Adamic-Adar more weight to low-degree common neighbors Preferential Attachment the rich get richer Resource Allocation how a node indirectly influences its pair s neighborhoods Tieness combination of Neighborhood Overlap and weight 123
124 SR - Number of Shared Repositories Number of shared repositories between a pair of developers 5 repositories 124
125 JCSR - Jointly developers contribution to shared repositories Contribution of a pair of developers relative to the others in a same repository 2/2=1 2 / 3 = 0,66 125
126 JCOSR - Jointly developers commits to shared repositories Number of commits of a pair of developers in shared repositories (15 + 3) / 18 = 1 ( ) / 160 = 0,
127 JWCOSR - Jointly developers weighted commit to shared repositories Number of lines on commits of a pair of developers in shared repositories (( ) + ( )) / ( ) = 1 (( ) + ( )) / ( ) = 0, additions 12 deletions 300 additions 0 deletions 500 additions 200 deletions additions 300 deletions additions 0 deletions 127
128 PC - Previous Collaboration Collaborations in past repositories relative to the number of developers (⅓ + ½) / 2 = 0,
129 GPC - Global Potential Contributions Potential time of collaboration between a pair of developers in the network ( ) / 20* = 0,65 5 repositories: R1: 2 months R2: 3 months R3: 1 month R4: 1 month R5: 6 months * longer network collaboration time 129
130 Analysis and Results The average number of connections between developers varies according to the programming language Few pairs of developers have interactions in more than one repository 130
131 Analysis and Results To define a computational model to measure the strength of collaboration, we must analyze which properties best classify such a strength Combine semantic + topological metrics More importance to strong relationships {Tieness, Resource Allocation} + semantic 131
132 JavaScript Ruby Analysis and Results Just one metric should be considered between T_SR, T_JCOSR, T_JWCOSR, T_PC and T_GPC because they are strongly correlated 132
133 JavaScript Ruby Analysis and Results Just one metric should be considered between T_SR, T_JCOSR, T_JWCOSR, T_PC and T_GPC because they are strongly correlated Individually, T_JCSR should be considered 133
134 Just one metric should be considered between JCOSR, JCSR and PC because they are strongly correlated Ruby JavaScript Analysis and Results 134
135 Just one metric should be considered between AA and PA because they are strongly correlated Ruby JavaScript Analysis and Results 135
136 JavaScript Ruby Analysis and Results Just one metric should be considered between AA and PA because they are strongly correlated All metrics SR, GPC e JWCOSR should be considered 136
137 Example: Collaboration Ranking Tieness +Jointly developers Weighted Commit to Shared Repositories 137
138 Example: Collaboration Ranking 138
139 Conclusions: Analysis and Results In general, a computational model to measure the strength of collaboration should consider: Metrics T_JCSR, SR, GPC e JWCOSR Just one metric between T_SR, T_JCOSR, T_JWCOSR, T_PC and T_GPC Just one metric between AA and PA Just one metric between JCOSR, JCSR and PC 139
140 Outline Introduction Background General taxonomy for social networks Tie strength over academic social networks Collaboration strength metrics and analyses on GitHub Tie strength application Final thoughts and future work 140
141 6. Tie Strength Application J. C. Leão, M. A. Brandão, P. O. S. V. Melo & A. H. F. Laender. Classificação de Relações Sociais para Melhorar a Detecção de Comunidades. In: BRASNAM 2017 M. A. Brandão & M. M. Moro. Strength of Co-authorship Ties in Clusters: a Comparative Analysis. In AMW 2017 M. O. Silva, M. A. Brandão & M. M. Moro. A Força dos Relacionamentos Pode Medir a 141 Qualidade de Comunidades?. In SBBD - short papers, 2017
142 How to improve community detection? 142
143 What is community? "A group of people living in the same place or having a particular characteristic in common." Oxford Dictionaries
144 What about community in social networks?
145 How to improve community detection? 145
146 Social Relationships Similarity neighborhood overlap Regularidade edge persistence (temporal dimension) 146
147 Relationship Classes Social Social (regular and similar) Class Friends (strong) Acquaintances (weak) Bridge Random x Random Random (infrequent and not very similar) Edge persistence NO Relação Regular Similar Amizade + + socialconhecido social + Ponte + random Aleatória social social random random random 147
148 Social Relationship Mining Process
149 Evaluation In the original temporal networks Topological properties Community detection Filtering process Filtered network x original network Topological changing Comparison between detected communities
150 Number of relationships Evaluation: Process Convergence Verification Classes Friend Bridge Acquaintance Random 1th 2th 3th 4th 5th 6th 1th 2th 3th Noise removal iterations until converging to 0 random relationships
151 Evaluation: Detected Communities Community detection techniques Louvain Modularity [Blondel et al., 2008] Edge Betweenness [Newman and Girvan, 2004] Greedy Optimization of Modularity [Clauset et al., 2004] Label Propagation [Raghavan et al., 2007]
152 Modularity Evaluation: Process Quality Filtered Sample Original
153 Modularity Gain Network Original and 2-Filtered
154 The mining process removed the noise Conclusion Tie strength application Random relationships impact on community formation Experiments performed in real datasets with state-of-the-art clustering algorithms 154
155 Outline Introduction Background General taxonomy for social networks Tie strength over academic social networks Collaboration strength metrics and analyses on GitHub Tie strength application Final thoughts and future work 155
156 7. Final Thoughts and Future Work 156
157 Final Thoughts About Social Professional Networks Exploring heterogeneous networks Combining data from different networks 157
158 Final Thoughts About Collaboration Tie Strength Measuring tie strength Aspects that impact on the strength of ties Tie strength for temporal networks Varying over time 158
159 Final Thoughts About Tie Strength Applications Measuring and improving clustering quality Predicting links 159
160 Final Thoughts Measuring tie strength Different properties vs general networks 160
161 Future Work and Open Problems Expanding the study to other collaboration social networks Using qualitative research to evaluate tie strength Evaluating tie strength methods by comparing with synthetic data 161
162 Clustering analyses and evaluation Future Work and Open Problems Differentiating the parameter of each property in temporal_tieness Adding other social network features to STACY Group recommendation 162
163 Future Work and Open Problems Considering semantic and multiple features Identifying data veracity Capturing nonprofessional social behavior Temporal social professional networks 163
164 Acknowledgements 164
165 Social Professional Networks: Taxonomy, Metrics and Analyses of Relationship Strength Michele A. Brandão & Mirella M. Moro {micheleabrandao,