Social Professional Networks: Taxonomy, Metrics and Analyses of Relationship Strength

Size: px
Start display at page:

Download "Social Professional Networks: Taxonomy, Metrics and Analyses of Relationship Strength"

Transcription

1 Social Professional Networks: Taxonomy, Metrics and Analyses of Relationship Strength

2 Authors Mirella M. Moro is associate professor at the Computer Science department at UFMG (Belo Horizonte, Brazil) Michele A. Brandão is currently at UFMG as staff ( professor substituto ) and posdoctoral resident 2

3 1. Introduction Context Social networks relationships and interactions among individuals Their models and patterns allow to solve different problems 3

4 4 Source:

5 5 Source:

6 Everybody knows What an online social network is The most used ones Who uses them (next slide) How to use them How to profit from them (?) Who are the most influencing people (?) How it evolves (?) 6 Source:

7 Three-quarters of Facebook users and half of Instagram users use each site daily Among the users of each social networking site, % who use these sites Source: 7 Source:

8 Number of social network users in selected countries in 2017 and 2022 (in millions) 8 Source:

9 Social Networks Value The people with whom we interact on a regular basis, and even some with whom we interact only sporadically, influence our beliefs, decisions and behaviors Matthew O. Jackson: An Overview of Social Networks and Economic Applications. In: Handbook of Social Economics. Edited by Jess Benhabib, Alberto Bisin and Matthew O. Jackson. Elsevier,

10 Social Networks Value Examples of the effects of social networks on economic activity are abundant and pervasive, including roles in transmitting information about jobs, new products, technologies, and political opinions Matthew O. Jackson: An Overview of Social Networks and Economic Applications. In: Handbook of Social Economics. Edited by Jess Benhabib, Alberto Bisin and Matthew O. Jackson. Elsevier,

11 Social Networks Value Networks of relationships among various firms and political organizations affect research and development, patent activity, trade patterns, and political alliances Matthew O. Jackson: An Overview of Social Networks and Economic Applications. In: Handbook of Social Economics. Edited by Jess Benhabib, Alberto Bisin and Matthew O. Jackson. Elsevier,

12 Social Networks Value Given the many roles of networks in economic activity, they have become increasingly studied by economists Matthew O. Jackson: An Overview of Social Networks and Economic Applications. In: Handbook of Social Economics. Edited by Jess Benhabib, Alberto Bisin and Matthew O. Jackson. Elsevier,

13 Example Brazil Mining over Twitter data Outbreaks detection Predict future outbreaks Plan combat actions properly 13 pixabay.com

14 Outline Introduction Background General taxonomy for social networks Tie strength over academic social networks Collaboration strength metrics and analyses on GitHub Tie strength application Final thoughts and future work 14

15 How to Model Social Networks? 15

16 Social Networks Models Homogeneous 16

17 Social Networks Models Bipartite Heterogeneous 17

18 Social Networks Models Multipartite Heterogeneous 18

19 Social Networks Models Multigraphs Heterogeneous 19

20 Social Networks Models Multilayer Heterogeneous 20

21 Ruby Developer Network Heterogeneous network GitHub Example JavaScript Developer Network Relationship based on contributing to the same repository User X follows user Y User X opened an issue assigned to Y Users commented in the same issue User X watches a repository owned by Y 21 Python Developer Network

22 Why use social network data? 22

23 23 Source:

24 + Geographic information 24 Source:

25 R. Caldelli, R. Becarelli and I. Amerini. Image Origin Classification Based on Social Network Provenance in IEEE Transactions on Information Forensics and Security, vol. 12, no. 6, pp ,

26 + Geographic information C.-K. Hsieh et al. Immersive Recommendation: News and Event Recommendations Using Personal Digital Traces. In Proceedings of the 25th International Conference on World Wide Web, 51-62,

27 What are the properties in social networks? 27

28 Nodes Edges Degree Betweenness Closeness Neighborhood overlap PageRank Interaction frequency Clustering coefficient Path length

29 We focus on tie strength 29

30 Outline Introduction Background General taxonomy for social networks Tie strength over academic social networks Collaboration strength metrics and analyses on GitHub Tie strength application Final thoughts and future work 30

31 3. General Taxonomy for SN M. A. Brandão & M. M. Moro. Social Professional Network: a Survey and Taxonomy. Journal of Computer Communications (COMCOM), v. 100, p. 20, 2017

32 Social Networks Research Topics Tasks Issues Crawling Storing Managing Treating Data from the networks The way the networks can be Analyzed Used Improved Applied 32

33 Issues Problems within social networks regarding their maintenance and usage Tasks Problems whose solutions benefit from using SN data 33

34 34

35 35

36 Social Networks 36

37 37

38 Social Professional Networks Motivation More than 20 websites for social profissional purpose Additional challenge emotional reasons Different studies 38

39 39

40 40

41 41

42 Outline Introduction Background General taxonomy for social networks Tie strength over academic social networks Collaboration strength metrics and analyses on GitHub Tie strength application Final thoughts and future work 42

43 4. Tie strength over academic social networks M. A. Brandão & M. M. Moro. Analyzing the Strength of Co-authorship Ties with Neighborhood Overlap. In DEXA 2015 M. A. Brandão & M. M. Moro. Neighborhood Overlap: Can This Metric Be Used to Characterize the Strength of Co-authorship Ties? In Grace Hopper Celebration, 2015 M. A. Brandão, M. A. Diniz & M. M. Moro. Using Topological Properties to Measure the Strength of Co-authorship Ties. In BraSNAM 2016 M. A. Brandão & M. M. Moro. The Strength of Co-authorship Ties through Different Topological Properties. JBCS, 23(1):5, 2017 M. A. Brandão, P. O. S. V. Melo & M. M. Moro. Tie Strength Dynamics over Temporal Co-authorship Social Networks. In: IEEE/WIC/ACM Web Intelligence, 2017

44 Context Current trend: social network analyses Central aspect: tie strength 44

45 1. How to measure tie strength? 45

46 "Tie strength may be measured by a combination of the amount of time, the cooperation intensity and the reciprocal services that characterize the tie " [Granovetter, 1973] 46

47 2. How to analyze tie strength? 47

48 Overview 48

49 Research Questions 1. How can we measure the strength of co-authorship ties? 2. Can we identify which aspects impact on the strength of co-authorship ties? 3. How is tie strength defined for temporal networks? 4. How much does the strength of ties vary over time? 49

50 Relevant knowledge Relevance Different applications Academic context Ranking Name disambiguation 50

51 Related Work 51

52 Related Work over Tie Strength Tie strength overview Granovetter [1973] Work ties [Castilho et al., 2017] Friendship ties [Seo et al., 2017; Zignani et al., 2016] Contact ties [Wiese et al., 2015] Developer ties [Alves et al., 2016] 52

53 Related Work over Tie Strength Temporal networks Temporal aspects has not been largely explored yet Kostakos [2009] and Nicosia et al. [2013] propose a set of network properties that consider the temporal aspect 53

54 Preliminary Study The Strength of Co-authorship Ties 54

55 Research Questions 1. How can we measure the strength of co-authorship ties? 2. Can we identify which aspects impact on the strength of co-authorship ties? 3. How is tie strength defined for temporal networks? 4. How much does the strength of ties vary over time? 55

56 Dataset Distribution of numbers of co-authors for researchers in each area 56

57 Dataset Distribution of numbers of co-authors for researchers in each area In sociology: From 7,195 publications, 83.96% have only one author 57

58 Neighborhood Overlap Characterizing Tie Strength 58

59 Correlation analyses Regression analyses Impact of Properties Property CS Med Soc Clustering coefficient SLC SLC SLC Edge betweenness EC EC EC Number of triangles SLC Eigenvector SLC Closeness EC Eccentricity EC SLC - Strong Linearly Correlated EC - Exponentially Correlated 59

60 Tie Strength Properties Neighborhood overlap NO Absolute frequency of interaction (edge weight) - W co-authorship frequency 60

61 Tie Strength over Non-temporal Social Network 61

62 Problems with NO and W Case 1: No common co-author 62

63 Problems with NO and W Case 2: No representation if the tie is inside a community or not 63

64 Problems with NO and W Case 3: Many common co-authors 64

65 Problems with NO and W Case 4: Results too small/high 65

66 Dataset Dataset Number of nodes Number of edges Period DBLP Articles 837,583 2,935, to 2015 DBLP Inproceedings 945,297 3,760, to 2015 PubMed 443,784 5,550, to 2016 APS 180, , to

67 Correlation Coefficients Between neighborhood overlap and co-authorship frequency Dataset Kendall Pearson Spearman DBLP Articles DBLP Inproceedings PubMed APS 67

68 Tieness Our new metric 2 68

69 Tieness Our new metric 2 69

70 Tieness Our new metric 2 70

71 Tieness Our new metric 2 71

72 Tieness Overall level of tieness in a SN 72

73 Tieness Real case Top 10 - José Palazzo M. de Oliveira Co-author coafrequency NO Tieness Leandro krug Wives Ana Marilza Pernas Stanley Loh Isabela Gasparini Daniel Lichtnow Marcelo Soares Pimenta Giseli Rabello Lopes Alencar Machado José Valdeni de Lima Mirella M. Moro

74 Tieness Real case NO Tieness Top 10 - José Palazzo M. de Oliveira Co-author coafrequency NO Tieness Leandro krug Wives Ana Marilza Pernas Stanley Loh Isabela Gasparini Daniel Lichtnow Marcelo Soares Pimenta Giseli Rabello Lopes Alencar Machado José Valdeni de Lima Mirella M. Moro

75 Tieness Real case How can we differentiate the two coauthors? Top 10 - José Palazzo M. de Oliveira Co-author coafrequency NO Tieness Leandro krug Wives Ana Marilza Pernas Stanley Loh Isabela Gasparini Daniel Lichtnow Marcelo Soares Pimenta Giseli Rabello Lopes Alencar Machado José Valdeni de Lima Mirella M. Moro

76 Tieness Real case Top 10 - José Palazzo M. de Oliveira Co-author coafrequency NO Tieness Leandro krug Wives Ana Marilza Pernas Stanley Loh Isabela Gasparini Daniel Lichtnow Marcelo Soares Pimenta Giseli Rabello Lopes Alencar Machado José Valdeni de Lima Mirella M. Moro

77 Tieness Nominal scale Granovetter's theory analysis agreement 77

78 Until now Introduction Background General taxonomy for social networks Tie strength over academic social networks Tie strength non-temporal networks 78

79 Tie Strength over Temporal Social Network 79

80 Time 80

81 A Justification for Using Temporal Aspect Most work focus only on strong and weak ties (e.g. Granovetter [1973] ) RECAST (Vaz de Melo et al. [2015]) is slow and do not consider the duration of the interactions Other approaches need social network data (e.g. historic of chat messages) 81

82 Background Static aggregated graph Dynamic graph t-1 t 82

83 Random graphs RECAST Random relationship ClASsifer strategy Two features Neighborhood overlap Edge persistence Small networks 83

84 fast-recast Build more than one random graph at a time Compute edge persistence and topological overlap in parallel Optimize memory use 84

85 fast-recast fast Random relationship ClASsifer strategy Class Properties Friends (strong) Frequent interactions and many common friends Acquaintances (weak) Infrequent interactions and many common friends Bridge Frequent interactions and not many common friends Random Infrequent interactions and not many common friends 85

86 fast-recast For PubMed dataset 86

87 fast-recast Relationship classes DBLP Articles DBLP Inproceedings PubMed APS 87

88 Our new algorithm 88

89 STACY Strength of Ties Automatic-Classifier over the Years Three features Neighborhood overlap (# of common friends) Edge persistence (interaction frequency) Co-authorship frequency Random graphs 89

90 STACY 90

91 STACY Relationship classes Class Common Friends Interaction intensity Class1 - strong high high high Class2 - bridge+ high low high low high high high high low Class5 - bursty low low high Class6 - bridge high low low Class7 - weak low high low Class8 - random low low low Class3 - transient Social Interaction frequency Class4 - periodic 91

92 STACY Relationship classes 1 - strong 2 - bridge+ 3 - transient 4 - periodic 5 - bursty 6 - bridge 7 - weak 8 - random DBLP Articles DBLP Inproceedings 8 92

93 STACY Relationship classes 1 - strong 2 - bridge+ 3 - transient 4 - periodic 5 - bursty 6 - bridge 7 - weak 8 - random PubMed APS

94 STACY Relationship classes 94

95 STACY Relationship classes 1 - strong 2 - bridge+ 3 - transient 4 - periodic 5 - bursty 6 - bridge 7 - weak 8 - random 95

96 Research Questions 1. How can we measure the strength of co-authorship ties? 2. Can we identify which aspects impact on the strength of co-authorship ties? 3. How is tie strength defined for temporal networks? 4. How much does the strength of ties vary over time? 96

97 Temporal Networks Tie Strength Concept Strong ties persistent Weak ties sporadic 97

98 Tie Strength Varying over Time 80%-20% Dataset Link persistence fast-recast STACY DBLP Articles DBLP Inproceedings PubMed APS STACY classifies strong ties better than fast-recast Strong ties and bridges persist more than others 98

99 Tie Strength Varying over Time 50%-50% - PubMed Class Periodic Bridge Disappear Strong Bridge Transient Periodic Bursty Bridge Weak Random Link transformation Most ties tend to disappear STACY reveals ties tend to change to class4 (periodic) and class6 (bridge) The paper provides more details 99

100 Deriving a new computational model 100

101 Temporal_tieness Low computational cost 101

102 Temporal_tieness Low computational cost Edge persistence 102

103 Temporal_tieness Low computational cost Neighborhood overlap (or topological overlap) 103

104 Temporal_tieness Low computational cost Edge weight 104

105 Measuring the strength of co-authorship ties Conclusion Aspects that impact on the strength of ties Tie strength for temporal networks Varying over time 105

106 Conclusion Methods to automatically detect relationship classes Differentiate random from social relationships 106

107 Outline Introduction Background General taxonomy for social networks Tie strength over academic social networks Collaboration strength metrics and analyses on GitHub Tie strength application Final thoughts and future work 107

108 5. Collaboration Strength Metrics and Analyses on GitHub N. A. Batista, M. A. Brandão, G. B. Alves, A. P. C. Silva & M. M. Moro. Collaboration Strength Metrics and Analyses on GitHub. In: 2017 IEEE/WIC/ACM Web Intelligence, 2017 N. A. Batista, G. B. Alves, A. Gonzaga & M. Brandão. GitSED: Um Conjunto de Dados com Informações Sociais baseado no GitHub. In SBBD DSW, 2017 G. B. Alves; M. A. Brandão; D. M. Santana; A. P. C. Silva; M. M. Moro. The Strength of Social Coding Collaboration on GitHub. In SBBD - Short papers,

109 Context Social Coding is a software development s approach that allows the collaboration among developers 109

110 Relevance Evaluating the strength of the developer's relationship can help to improve: Recommendation of developers to work in a project and fix bugs Productivity analysis of a development team 110

111 How to measure the strength of collaboration between developers on GitHub? 111

112 Related Work Casalnuovo et al. [2015] analyze the productivity of developers in projects Bartusiak et al. [2016] predict developers collaboration Tsay et al. [2014] investigate the acceptance of pull requests 112

113 Related Work Such studies measure the strength of interactions differently. However, none evaluates the best way to measure such strength and none investigates the correlation among such metrics 113

114 Contributions Analyze the strength of social collaboration measured by distinct metrics through different programming languages: insights on relationships patterns 1. GitSED - GitHub Socially Enhanced Dataset: curated (filtered on two programming languages), augmented (data not available on GHTorrent), and enriched (social network information) 2. Three new metrics for the strength of social coding collaboration: commits number of lines, potential of contribution, prior social interaction 3. Evaluate all metrics over two social networks for JavaScript and Ruby 114

115 Database GHTorrent: database from September, 2015 Not including: forked and deleted projects Two programming languages JavaScript Ruby 115

116 Database 116

117 Database 117

118 Database GitSED: GitHub Socially Enhanced Dataset 118

119 Collaboration Network Links Two developers contribute to the same repository Weights By topological and semantic metrics 119

120 Collaboration Network 120

121 Collaboration Network 121

122 Collaboration Network Weight attributed from proposed metrics 122

123 Topological Properties Clustering Coefficient is the tendency of the nodes to cluster Neighborhood Overlap computes the strength of the links Adamic-Adar more weight to low-degree common neighbors Preferential Attachment the rich get richer Resource Allocation how a node indirectly influences its pair s neighborhoods Tieness combination of Neighborhood Overlap and weight 123

124 SR - Number of Shared Repositories Number of shared repositories between a pair of developers 5 repositories 124

125 JCSR - Jointly developers contribution to shared repositories Contribution of a pair of developers relative to the others in a same repository 2/2=1 2 / 3 = 0,66 125

126 JCOSR - Jointly developers commits to shared repositories Number of commits of a pair of developers in shared repositories (15 + 3) / 18 = 1 ( ) / 160 = 0,

127 JWCOSR - Jointly developers weighted commit to shared repositories Number of lines on commits of a pair of developers in shared repositories (( ) + ( )) / ( ) = 1 (( ) + ( )) / ( ) = 0, additions 12 deletions 300 additions 0 deletions 500 additions 200 deletions additions 300 deletions additions 0 deletions 127

128 PC - Previous Collaboration Collaborations in past repositories relative to the number of developers (⅓ + ½) / 2 = 0,

129 GPC - Global Potential Contributions Potential time of collaboration between a pair of developers in the network ( ) / 20* = 0,65 5 repositories: R1: 2 months R2: 3 months R3: 1 month R4: 1 month R5: 6 months * longer network collaboration time 129

130 Analysis and Results The average number of connections between developers varies according to the programming language Few pairs of developers have interactions in more than one repository 130

131 Analysis and Results To define a computational model to measure the strength of collaboration, we must analyze which properties best classify such a strength Combine semantic + topological metrics More importance to strong relationships {Tieness, Resource Allocation} + semantic 131

132 JavaScript Ruby Analysis and Results Just one metric should be considered between T_SR, T_JCOSR, T_JWCOSR, T_PC and T_GPC because they are strongly correlated 132

133 JavaScript Ruby Analysis and Results Just one metric should be considered between T_SR, T_JCOSR, T_JWCOSR, T_PC and T_GPC because they are strongly correlated Individually, T_JCSR should be considered 133

134 Just one metric should be considered between JCOSR, JCSR and PC because they are strongly correlated Ruby JavaScript Analysis and Results 134

135 Just one metric should be considered between AA and PA because they are strongly correlated Ruby JavaScript Analysis and Results 135

136 JavaScript Ruby Analysis and Results Just one metric should be considered between AA and PA because they are strongly correlated All metrics SR, GPC e JWCOSR should be considered 136

137 Example: Collaboration Ranking Tieness +Jointly developers Weighted Commit to Shared Repositories 137

138 Example: Collaboration Ranking 138

139 Conclusions: Analysis and Results In general, a computational model to measure the strength of collaboration should consider: Metrics T_JCSR, SR, GPC e JWCOSR Just one metric between T_SR, T_JCOSR, T_JWCOSR, T_PC and T_GPC Just one metric between AA and PA Just one metric between JCOSR, JCSR and PC 139

140 Outline Introduction Background General taxonomy for social networks Tie strength over academic social networks Collaboration strength metrics and analyses on GitHub Tie strength application Final thoughts and future work 140

141 6. Tie Strength Application J. C. Leão, M. A. Brandão, P. O. S. V. Melo & A. H. F. Laender. Classificação de Relações Sociais para Melhorar a Detecção de Comunidades. In: BRASNAM 2017 M. A. Brandão & M. M. Moro. Strength of Co-authorship Ties in Clusters: a Comparative Analysis. In AMW 2017 M. O. Silva, M. A. Brandão & M. M. Moro. A Força dos Relacionamentos Pode Medir a 141 Qualidade de Comunidades?. In SBBD - short papers, 2017

142 How to improve community detection? 142

143 What is community? "A group of people living in the same place or having a particular characteristic in common." Oxford Dictionaries

144 What about community in social networks?

145 How to improve community detection? 145

146 Social Relationships Similarity neighborhood overlap Regularidade edge persistence (temporal dimension) 146

147 Relationship Classes Social Social (regular and similar) Class Friends (strong) Acquaintances (weak) Bridge Random x Random Random (infrequent and not very similar) Edge persistence NO Relação Regular Similar Amizade + + socialconhecido social + Ponte + random Aleatória social social random random random 147

148 Social Relationship Mining Process

149 Evaluation In the original temporal networks Topological properties Community detection Filtering process Filtered network x original network Topological changing Comparison between detected communities

150 Number of relationships Evaluation: Process Convergence Verification Classes Friend Bridge Acquaintance Random 1th 2th 3th 4th 5th 6th 1th 2th 3th Noise removal iterations until converging to 0 random relationships

151 Evaluation: Detected Communities Community detection techniques Louvain Modularity [Blondel et al., 2008] Edge Betweenness [Newman and Girvan, 2004] Greedy Optimization of Modularity [Clauset et al., 2004] Label Propagation [Raghavan et al., 2007]

152 Modularity Evaluation: Process Quality Filtered Sample Original

153 Modularity Gain Network Original and 2-Filtered

154 The mining process removed the noise Conclusion Tie strength application Random relationships impact on community formation Experiments performed in real datasets with state-of-the-art clustering algorithms 154

155 Outline Introduction Background General taxonomy for social networks Tie strength over academic social networks Collaboration strength metrics and analyses on GitHub Tie strength application Final thoughts and future work 155

156 7. Final Thoughts and Future Work 156

157 Final Thoughts About Social Professional Networks Exploring heterogeneous networks Combining data from different networks 157

158 Final Thoughts About Collaboration Tie Strength Measuring tie strength Aspects that impact on the strength of ties Tie strength for temporal networks Varying over time 158

159 Final Thoughts About Tie Strength Applications Measuring and improving clustering quality Predicting links 159

160 Final Thoughts Measuring tie strength Different properties vs general networks 160

161 Future Work and Open Problems Expanding the study to other collaboration social networks Using qualitative research to evaluate tie strength Evaluating tie strength methods by comparing with synthetic data 161

162 Clustering analyses and evaluation Future Work and Open Problems Differentiating the parameter of each property in temporal_tieness Adding other social network features to STACY Group recommendation 162

163 Future Work and Open Problems Considering semantic and multiple features Identifying data veracity Capturing nonprofessional social behavior Temporal social professional networks 163

164 Acknowledgements 164

165 Social Professional Networks: Taxonomy, Metrics and Analyses of Relationship Strength Michele A. Brandão & Mirella M. Moro {micheleabrandao,