CMU SCS Mining Large Social Networks: Patterns and Anomalies

Size: px
Start display at page:

Download "CMU SCS Mining Large Social Networks: Patterns and Anomalies"

Transcription

1 Mining Large Social Networks: Patterns and Anomalies Christos Faloutsos CMU

2 Thank you The Department of Informatics Happy 20-th! Prof. Yannis Manolopoulos Prof. Kostas Tsichlas Mrs. Nina Daltsidou C. Faloutsos (CMU) 2

3 International-caliber friends among AUTH alumni Prof. Evimaria Terzi (U. Boston) Prof. Kyriakos Mouratidis (SMU) Dr. Michalis Vlachos (IBM) C. Faloutsos (CMU) 3

4 Outline Introduction Motivation Problem#1: Patterns in graphs Problem#2: Tools Problem#3: Scalability Conclusions C. Faloutsos (CMU) 4

5 Graphs - why should we care? $10s of BILLIONS revenue >500M users Food Web [Martinez 91] Internet Map [lumeta.com] C. Faloutsos (CMU) 5

6 Graphs - why should we care? IR: bi-partite graphs (doc-terms) D T 1 web: hyper-text graph D N T M... and more: C. Faloutsos (CMU) 6

7 Graphs - why should we care? web-log ( blog ) news propagation computer network security: /ip traffic and anomaly detection... [subject-verb-object: graph] Graph == relational table with 2 columns (src, dst) BIG DATA big graphs C. Faloutsos (CMU) 7

8 Outline Introduction Motivation Problem#1: Patterns in graphs Static graphs Weighted graphs Time evolving graphs Problem#2: Tools Problem#3: Scalability Conclusions C. Faloutsos (CMU) 8

9 Problem #1 - network and graph mining What does the Internet look like? What does FaceBook look like? What is normal / abnormal? which patterns/laws hold? C. Faloutsos (CMU) 9

10 Graph mining Are real graphs random? C. Faloutsos (CMU) 10

11 Laws and patterns Are real graphs random? A: NO!! Diameter in- and out- degree distributions other (surprising) patterns So, let s look at the data C. Faloutsos (CMU) 11

12 Solution# S.1 Power law in the degree distribution [SIGCOMM99] log(degree) ibm.com internet domains att.com log(rank) C. Faloutsos (CMU) 12

13 Solution# S.1 Power law in the degree distribution [SIGCOMM99] log(degree) ibm.com internet domains att.com log(rank) C. Faloutsos (CMU) 13

14 But: How about graphs from other domains? C. Faloutsos (CMU) 14

15 More power laws: web hit counts [w/ A. Montgomery] Web Site Traffic Count (log scale) Zipf ``ebay users sites in-degree (log scale) C. Faloutsos (CMU) 15

16 And numerous more Who-trusts-whom (epinions.com) Income [Pareto] distribution Duration of downloads [Bestavros+] Duration of UNIX jobs ( mice and elephants ) Size of files of a user Black swans C. Faloutsos (CMU) 16

17 Outline Introduction Motivation Problem#1: Patterns in graphs Static graphs degree, diameter, eigen, Triangles Time evolving graphs Problem#2: Tools C. Faloutsos (CMU) 17

18 Solution# S.3: Triangle Laws Real social networks have a lot of triangles C. Faloutsos (CMU) 18

19 Solution# S.3: Triangle Laws Real social networks have a lot of triangles Friends of friends are friends Any patterns? C. Faloutsos (CMU) 19

20 Triangle Law: #S.3 [Tsourakakis ICDM 2008] Reuters SN Epinions X-axis: degree Y-axis: mean # triangles n friends -> ~n 1.6 triangles C. Faloutsos (CMU) 20

21 Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD 11] C. Faloutsos (CMU) 21

22 Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD 11] C. Faloutsos (CMU) 22

23 Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD 11] C. Faloutsos (CMU) 23

24 Outline Introduction Motivation Problem#1: Patterns in graphs Static graphs Time evolving graphs Problem#2: Tools C. Faloutsos (CMU) 24

25 Problem: Time evolution with Jure Leskovec (CMU -> Stanford) and Jon Kleinberg (Cornell CMU) C. Faloutsos (CMU) 25

26 T.1 Evolution of the Diameter Prior work on Power Law graphs hints at slowly growing diameter: diameter ~ O(log N) diameter ~ O(log log N) What is happening in real data? C. Faloutsos (CMU) 26

27 T.1 Evolution of the Diameter Prior work on Power Law graphs hints at slowly growing diameter: diameter ~ O(log N) diameter ~ O(log log N) What is happening in real data? Diameter shrinks over time C. Faloutsos (CMU) 27

28 T.1 Diameter Patents Patent citation network 25 years of 2.9 M nodes 16.5 M edges diameter time [years] C. Faloutsos (CMU) 28

29 Outline Introduction Motivation Problem#1: Patterns in graphs Problem#2: Tools Belief Propagation Problem#3: Scalability Conclusions C. Faloutsos (CMU) 29

30 E-bay Fraud detection w/ Polo Chau & Shashank Pandit, CMU [www 07] C. Faloutsos (CMU) 30

31 E-bay Fraud detection C. Faloutsos (CMU) 31

32 E-bay Fraud detection C. Faloutsos (CMU) 32

33 E-bay Fraud detection - NetProbe C. Faloutsos (CMU) 33

34 Popular press And less desirable attention: from Belgium police ( copy of your code? ) C. Faloutsos (CMU) 34

35 Outline Introduction Motivation Problem#1: Patterns in graphs Problem#2: Tools Problem#3: Scalability -PEGASUS Conclusions C. Faloutsos (CMU) 35

36 Scalability Google: > 450,000 processors in clusters of ~2000 processors each [Barroso, Dean, Hölzle, Web Search for a Planet: The Google Cluster Architecture IEEE Micro 2003] Yahoo: 5Pb of data [Fayyad, KDD 07] Problem: machine failures, on a daily basis How to parallelize data mining tasks, then? A: map/reduce hadoop (open-source clone) C. Faloutsos (CMU) 36

37 Outline Introduction Motivation Problem#1: Patterns in graphs Problem#2: Tools Problem#3: Scalability PEGASUS Radius plot Conclusions C. Faloutsos (CMU) 37

38 HADI for diameter estimation Radius Plots for Mining Tera-byte Scale Graphs U Kang, Charalampos Tsourakakis, Ana Paula Appel, Christos Faloutsos, Jure Leskovec, SDM 10 Naively: diameter needs O(N**2) space and up to O(N**3) time prohibitive (N~1B) Our HADI: linear on E (~10B) Near-linear scalability wrt # machines Several optimizations -> 5x faster C. Faloutsos (CMU) 38

39 Count???? 19+ [Barabasi+] ~1999, ~1M nodes Radius C. Faloutsos (CMU) 39

40 Count?????? 19+ [Barabasi+] ~1999, ~1M nodes Radius YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) Largest publicly available graph ever studied. C. Faloutsos (CMU) 40

41 Count 14 (dir.)???? ~7 (undir.) 19+? [Barabasi+] Radius YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) Largest publicly available graph ever studied. C. Faloutsos (CMU) 41

42 Count 14 (dir.)???? ~7 (undir.) 19+? [Barabasi+] YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) 7 degrees of separation (!) Diameter: shrunk C. Faloutsos (CMU) Radius 42

43 Count???? ~7 (undir.) Radius YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) Q: Shape? C. Faloutsos (CMU) 43

44 YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) effective diameter: surprisingly small. Multi-modality (?!) C. Faloutsos (CMU) 44

45 Radius Plot of GCC of YahooWeb. C. Faloutsos (CMU) 45

46 YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) effective diameter: surprisingly small. Multi-modality: probably mixture of cores. C. Faloutsos (CMU) 46

47 Conjecture: EN DE ~7 BR YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) effective diameter: surprisingly small. Multi-modality: probably mixture of cores. C. Faloutsos (CMU) 47

48 Conjecture: ~7 YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) effective diameter: surprisingly small. Multi-modality: probably mixture of cores. C. Faloutsos (CMU) 48

49 Outline Introduction Motivation Problem#1: Patterns in graphs Problem#2: Tools Problem#3: Scalability Conclusions C. Faloutsos (CMU) 49

50 OVERALL CONCLUSIONS low level: Several new patterns (shrinking diameters, triangle-laws, etc) New tools: Fraud detection (belief propagation) Scalability: PEGASUS / hadoop C. Faloutsos (CMU) 50

51 OVERALL CONCLUSIONS medium-level BIG DATA: Large datasets reveal patterns/outliers that are invisible otherwise C. Faloutsos (CMU) 51

52 Project info Chau, Polo Koutra, Danae Prakash, Aditya Akoglu, Leman Kang, U McGlohon, Mary Tong, Hanghang Thanks to: NSF IIS , IIS , CTA-INARC; Yahoo (M45), LLNL, IBM, SPRINT, C. Faloutsos (CMU) 52 Google, INTEL, HP, ilab

53 Thank you for the honor! Congratulations for 20-th anniversary and C. Faloutsos (CMU) 53

54 High-level conclusion: Collaborations Sociology + CS (triangles) Civil engineering + CS (sensor placement) fmri/medical + graphs (medical db s) C. Faloutsos (CMU) 54

55 Never stop learning GHRASKW AEI DIDASKOMENOS Socrates Plato Aristotle C. Faloutsos (CMU) 55

CMU SCS Mining Billion-node Graphs: Patterns and Tools

CMU SCS Mining Billion-node Graphs: Patterns and Tools Mining Billion-node Graphs: Patterns and Tools Christos Faloutsos CMU Thank you! Monica Rogati C. Faloutsos (CMU) 2 Our goal: Open source system for mining huge graphs: PEGASUS project (PEta GrAph mining

More information

CMU SCS Large Graph Mining

CMU SCS Large Graph Mining Large Graph Mining Christos Faloutsos CMU Thank you! Ed Kao Lori Tsoulas Joan Meehan-Dion C. Faloutsos (CMU) 2 Our goal: Open source system for mining huge graphs: PEGASUS project (PEta GrAph mining System)

More information

CMU SCS Large Graph Mining Patterns, Tools and Cascade analysis

CMU SCS Large Graph Mining Patterns, Tools and Cascade analysis Large Graph Mining Patterns, Tools and Cascade analysis Christos Faloutsos CMU Thank you! Brian Gallagher Jan Winfield C. Faloutsos (CMU) 2 Roadmap Introduction Motivation Why big data Why (big) graphs?

More information

CMU SCS Mining Billion-Node Graphs: Patterns and influence propagation

CMU SCS Mining Billion-Node Graphs: Patterns and influence propagation Mining Billion-Node Graphs: Patterns and influence propagation Christos Faloutsos CMU Thank you! Sinan Aral Foster Provost Arun Sundararajan Shirley Lau C. Faloutsos (CMU) 2 Our goal: Open source system

More information

Analytics Building Blocks

Analytics Building Blocks http://poloclub.gatech.edu/cse6242 CSE6242 / CX4242: Data & Visual Analytics Analytics Building Blocks Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech Partly based

More information

Analytics Building Blocks

Analytics Building Blocks http://poloclub.gatech.edu/cse6242 CSE6242 / CX4242: Data & Visual Analytics Analytics Building Blocks Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech Partly based

More information

Jure Leskovec, Includes joint work with Jaewon Yang, Manuel Gomez Rodriguez, Andreas Krause, Lars Backstrom and Jon Kleinberg.

Jure Leskovec, Includes joint work with Jaewon Yang, Manuel Gomez Rodriguez, Andreas Krause, Lars Backstrom and Jon Kleinberg. Jure Leskovec, Stanford University Includes joint work with Jaewon Yang, Manuel Gomez Rodriguez, Andreas Krause, Lars Backstrom and Jon Kleinberg. Information reaches us by personal influence in our social

More information

Sunnie Chung. Cleveland State University

Sunnie Chung. Cleveland State University Sunnie Chung Cleveland State University Data Scientist Big Data Processing Data Mining 2 INTERSECT of Computer Scientists and Statisticians with Knowledge of Data Mining AND Big data Processing Skills:

More information

Co-evolution of Social and Affiliation Networks

Co-evolution of Social and Affiliation Networks Co-evolution of Social and Affiliation Networks Elena Zheleva Dept. of Computer Science University of Maryland College Park, MD 20742, USA elena@cs.umd.edu Hossam Sharara Dept. of Computer Science University

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Unit # 19 1 Acknowledgement The following discussion is based on the paper Mining Big Data: Current Status, and Forecast to the Future by Fan and Bifet and online presentation

More information

E-guide Hadoop Big Data Platforms Buyer s Guide part 1

E-guide Hadoop Big Data Platforms Buyer s Guide part 1 Hadoop Big Data Platforms Buyer s Guide part 1 Your expert guide to Hadoop big data platforms for managing big data David Loshin, Knowledge Integrity Inc. Companies of all sizes can use Hadoop, as vendors

More information

CS 6604: Data Mining Large Networks and Time- series. B. Aditya Prakash Lecture #6: Hadoop and Graph Analysis

CS 6604: Data Mining Large Networks and Time- series. B. Aditya Prakash Lecture #6: Hadoop and Graph Analysis CS 0: Data Mining Large Networks and Time- series B. Aditya Prakash Lecture #: Hadoop and Graph Analysis What to do if data is really large? Peta- bytes (exabytes, ze?abytes.. ) Google processed PB of

More information

Jure Leskovec, Computer Science, Stanford University

Jure Leskovec, Computer Science, Stanford University Jure Leskovec, Computer Science, Stanford University Includes joint work with Jaewon Yang, Manuel Gomez- Rodriguez, Andreas Krause, Lars Backstrom and Jon Kleinberg. Information reaches us by personal

More information

Final Report: Local Structure and Evolution for Cascade Prediction

Final Report: Local Structure and Evolution for Cascade Prediction Final Report: Local Structure and Evolution for Cascade Prediction Jake Lussier (lussier1@stanford.edu), Jacob Bank (jbank@stanford.edu) ABSTRACT Information cascades in large social networks are complex

More information

Big Data: How it is Generated and its Importance

Big Data: How it is Generated and its Importance IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727 PP 01-05 www.iosrjournals.org Big Data: How it is Generated and its Importance Asst. Prof. Mrs. Mugdha Ghotkar 1, Ms.

More information

From Information to Insight: The Big Value of Big Data. Faire Ann Co Marketing Manager, Information Management Software, ASEAN

From Information to Insight: The Big Value of Big Data. Faire Ann Co Marketing Manager, Information Management Software, ASEAN From Information to Insight: The Big Value of Big Data Faire Ann Co Marketing Manager, Information Management Software, ASEAN The World is Changing and Becoming More INSTRUMENTED INTERCONNECTED INTELLIGENT

More information

Inferring Social Ties across Heterogeneous Networks

Inferring Social Ties across Heterogeneous Networks Inferring Social Ties across Heterogeneous Networks CS 6001 Complex Network Structures HARISH ANANDAN Introduction Social Ties Information carrying connections between people It can be: Strong, weak or

More information

Spark, Hadoop, and Friends

Spark, Hadoop, and Friends Spark, Hadoop, and Friends (and the Zeppelin Notebook) Douglas Eadline Jan 4, 2017 NJIT Presenter Douglas Eadline deadline@basement-supercomputing.com @thedeadline HPC/Hadoop Consultant/Writer http://www.basement-supercomputing.com

More information

Final Report: Local Structure and Evolution for Cascade Prediction

Final Report: Local Structure and Evolution for Cascade Prediction Final Report: Local Structure and Evolution for Cascade Prediction Jake Lussier (lussier1@stanford.edu), Jacob Bank (jbank@stanford.edu) December 10, 2011 Abstract Information cascades in large social

More information

COMP9321 Web Application Engineering

COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 1, 2017 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 11 (Part II) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2457

More information

Got Hadoop? Whitepaper: Hadoop and EXASOL - a perfect combination for processing, storing and analyzing big data volumes

Got Hadoop? Whitepaper: Hadoop and EXASOL - a perfect combination for processing, storing and analyzing big data volumes Got Hadoop? Whitepaper: Hadoop and EXASOL - a perfect combination for processing, storing and analyzing big data volumes Contents Introduction...3 Hadoop s humble beginnings...4 The benefits of Hadoop...5

More information

Transforming Big Data into Unlimited Knowledge

Transforming Big Data into Unlimited Knowledge Transforming Big Data into Unlimited Knowledge Timos Sellis, RMIT University timos.sellis@rmit.edu.au 2 Big Data What is it? Most commonly accepted definition, by Gartner (the 3 Vs) Big data is high-volume,

More information

Transforming Big Data into Unlimited Knowledge

Transforming Big Data into Unlimited Knowledge Transforming Big Data into Unlimited Knowledge Timos Sellis, RMIT University timos.sellis@rmit.edu.au Big Data What is it? Most commonly accepted definition, by Gartner (the 3 Vs) Big data is high-volume,

More information

Nouvelle Génération de l infrastructure Data Warehouse et d Analyses

Nouvelle Génération de l infrastructure Data Warehouse et d Analyses Nouvelle Génération de l infrastructure Data Warehouse et d Analyses November 2011 André Münger andre.muenger@emc.com +41 79 708 85 99 1 Agenda BIG Data Challenges Greenplum Overview Use Cases Summary

More information

Hadoop and Analytics at CERN IT CERN IT-DB

Hadoop and Analytics at CERN IT CERN IT-DB Hadoop and Analytics at CERN IT CERN IT-DB 1 Hadoop Use cases Parallel processing of large amounts of data Perform analytics on a large scale Dealing with complex data: structured, semi-structured, unstructured

More information

Will This Paper Increase Your h-index? Scientific Impact Prediction

Will This Paper Increase Your h-index? Scientific Impact Prediction Will This Paper Increase Your h-index? Scientific Impact Prediction Yuxiao Dong, Reid A. Johnson, Nitesh V. Chawla Interdisciplinary Center for Network Science and Applications University of Notre Dame

More information

BIG DATA PROCESSING A DEEP DIVE IN HADOOP/SPARK & AZURE SQL DW

BIG DATA PROCESSING A DEEP DIVE IN HADOOP/SPARK & AZURE SQL DW BIG DATA PROCESSING A DEEP DIVE IN HADOOP/SPARK & AZURE SQL DW TOPICS COVERED 1 2 Fundamentals of Big Data Platforms Major Big Data Tools Scaling Up vs. Out SCALE UP (SMP) SCALE OUT (MPP) + (n) Upgrade

More information

Online Social Networks

Online Social Networks Online Social Networks Daniel Huttenlocher John P. and Rilla Neafsey Professor of Computing, Information Science and Business Social Network Models [Zachary77] Individuals and relationships between them

More information

Smarter Analytics for Big Data

Smarter Analytics for Big Data Smarter Analytics for Big Data Anjul Bhambhri IBM Vice President, Big Data February 27, 2011 The World is Changing and Becoming More INSTRUMENTED INTERCONNECTED INTELLIGENT The resulting explosion of information

More information

Realising Value from Data

Realising Value from Data Realising Value from Data Togetherwith Open Source Drives Innovation & Adoption in Big Data BCS Open Source SIG London 1 May 2013 Timings 6:00-6:30pm. Register / Refreshments 6:30-8:00pm, Presentation

More information

Learning Based Admission Control. Jaideep Dhok MS by Research (CSE) Search and Information Extraction Lab IIIT Hyderabad

Learning Based Admission Control. Jaideep Dhok MS by Research (CSE) Search and Information Extraction Lab IIIT Hyderabad Learning Based Admission Control and Task Assignment for MapReduce Jaideep Dhok MS by Research (CSE) Search and Information Extraction Lab IIIT Hyderabad Outline Brief overview of MapReduce MapReduce as

More information

Can Cascades be Predicted?

Can Cascades be Predicted? Can Cascades be Predicted? Rediet Abebe and Thibaut Horel September 22, 2014 1 Introduction In this presentation, we discuss the paper Can Cascades be Predicted? by Cheng, Adamic, Dow, Kleinberg, and Leskovec,

More information

Top 5 Challenges for Hadoop MapReduce in the Enterprise. Whitepaper - May /9/11

Top 5 Challenges for Hadoop MapReduce in the Enterprise. Whitepaper - May /9/11 Top 5 Challenges for Hadoop MapReduce in the Enterprise Whitepaper - May 2011 http://platform.com/mapreduce 2 5/9/11 Table of Contents Introduction... 2 Current Market Conditions and Drivers. Customer

More information

Mining Heterogeneous Urban Data at Multiple Granularity Layers

Mining Heterogeneous Urban Data at Multiple Granularity Layers Mining Heterogeneous Urban Data at Multiple Granularity Layers Antonio Attanasio Supervisor: Prof. Silvia Chiusano Co-supervisor: Prof. Tania Cerquitelli collection Urban data analytics Added value urban

More information

Theses. Elena Baralis, Tania Cerquitelli, Silvia Chiusano, Paolo Garza Luca Cagliero, Luigi Grimaudo Daniele Apiletti, Giulia Bruno, Alessandro Fiori

Theses. Elena Baralis, Tania Cerquitelli, Silvia Chiusano, Paolo Garza Luca Cagliero, Luigi Grimaudo Daniele Apiletti, Giulia Bruno, Alessandro Fiori Theses Elena Baralis, Tania Cerquitelli, Silvia Chiusano, Paolo Garza Luca Cagliero, Luigi Grimaudo Daniele Apiletti, Giulia Bruno, Alessandro Fiori Turin, January 2015 General information Duration: 6

More information

IBM SPSS & Apache Spark

IBM SPSS & Apache Spark IBM SPSS & Apache Spark Making Big Data analytics easier and more accessible ramiro.rego@es.ibm.com @foreswearer 1 2016 IBM Corporation Modeler y Spark. Integration Infrastructure overview Spark, Hadoop

More information

red red red red red red red red red red red red red red red red red red red red CYS Rithu P Ravi CYS Saumya K

red red red red red red red red red red red red red red red red red red red red CYS Rithu P Ravi CYS Saumya K red red red red red red red red red red red red red red red red red red red red CYS14011 - Rithu P Ravi CYS14012 - Saumya K Why and What HADOOP?... Apache Hadoop is an open-source software framework A

More information

Final Project - Social and Information Network Analysis

Final Project - Social and Information Network Analysis Final Project - Social and Information Network Analysis Factors and Variables Affecting Social Media Reviews I. Introduction Humberto Moreira Rajesh Balwani Subramanyan V Dronamraju Dec 11, 2011 Problem

More information

Experiences in the Use of Big Data for Official Statistics

Experiences in the Use of Big Data for Official Statistics Think Big - Data innovation in Latin America Santiago, Chile 6 th March 2017 Experiences in the Use of Big Data for Official Statistics Antonino Virgillito Istat Introduction The use of Big Data sources

More information

CS4491/CS 7265 BIG DATA ANALYTICS

CS4491/CS 7265 BIG DATA ANALYTICS CS4491/CS 7265 BIG DATA ANALYTICS BIG DATA * The contents are adapted from Dr. Jeongkyu Lee@UB Mingon Kang, Ph.D Computer Science, Kennesaw State University Era of Big Data 2011: Amount of Digital Information

More information

Big Data in the Future Workforce

Big Data in the Future Workforce Big Data in the Future Workforce Prof Dr Abdullah G ani. SMIEEE, FASc Table of Contents What is data size What is Big Data and its characteristics Where the big data comes from? What processes involved

More information

Jure Leskovec Stanford University

Jure Leskovec Stanford University Jure Leskovec Stanford University Includes joint work with Jon Kleinberg (Cornell), Lars Backstrom (Cornell), Andreas Krause (Caltech), Carlos Guestrin (CMU),Christos Faloutsos (CMU) 2 Intersection of

More information

A Quantified Approach for Analyzing the User Rating Behaviour in Social Media

A Quantified Approach for Analyzing the User Rating Behaviour in Social Media A Quantified Approach for Analyzing the User Rating Behaviour in Social Media P. Surya 1, Dr. B. Umadevi 2 1 Research Scholar, 2 Assistant Professor & Head, P.G & Research Department of Computer Science,

More information

Data Informatics. Seon Ho Kim, Ph.D.

Data Informatics. Seon Ho Kim, Ph.D. Data Informatics Seon Ho Kim, Ph.D. seonkim@usc.edu What is Big Data? What is Big Data? Big Data is data whose scale, diversity, and complexity require new architecture, techniques, algorithms, and analytics

More information

Reaction Paper Regarding the Flow of Influence and Social Meaning Across Social Media Networks

Reaction Paper Regarding the Flow of Influence and Social Meaning Across Social Media Networks Reaction Paper Regarding the Flow of Influence and Social Meaning Across Social Media Networks Mahalia Miller Daniel Wiesenthal October 6, 2010 1 Introduction One topic of current interest is how language

More information

Operational Hadoop and the Lambda Architecture for Streaming Data

Operational Hadoop and the Lambda Architecture for Streaming Data Operational Hadoop and the Lambda Architecture for Streaming Data 2015 MapR Technologies 2015 MapR Technologies 1 Topics From Batch to Operational Workloads on Hadoop Streaming Data Environments The Lambda

More information

Modeling the Temporal Dynamics of Social Rating Networks using Bidirectional Effects of Social Relations and Rating Patterns

Modeling the Temporal Dynamics of Social Rating Networks using Bidirectional Effects of Social Relations and Rating Patterns Modeling the Temporal Dynamics of Social Rating Networks using Bidirectional Effects of Social Relations and Rating Patterns Mohsen Jamali School of Computing Science Simon Fraser University Burnaby, BC,

More information

China Center of Excellence

China Center of Excellence 1 China Center of Excellence Project Guardian ebay is a global company, projects within ebay normally require efforts and synergies from teams located in different cities of different countries. This is

More information

Business Intelligence, 4e (Sharda/Delen/Turban) Chapter 1 An Overview of Business Intelligence, Analytics, and Data Science

Business Intelligence, 4e (Sharda/Delen/Turban) Chapter 1 An Overview of Business Intelligence, Analytics, and Data Science Business Intelligence, 4e (Sharda/Delen/Turban) Chapter 1 An Overview of Business Intelligence, Analytics, and Data Science 1) Computerized support is only used for organizational decisions that are responses

More information

Text Mining. Theory and Applications Anurag Nagar

Text Mining. Theory and Applications Anurag Nagar Text Mining Theory and Applications Anurag Nagar Topics Introduction What is Text Mining Features of Text Document Representation Vector Space Model Document Similarities Document Classification and Clustering

More information

SOCIAL MEDIA MINING. Behavior Analytics

SOCIAL MEDIA MINING. Behavior Analytics SOCIAL MEDIA MINING Behavior Analytics Dear instructors/users of these slides: Please feel free to include these slides in your own material, or modify them as you see fit. If you decide to incorporate

More information

Basics of Big Data Analytics

Basics of Big Data Analytics Basics of Big Data Analytics BRETT AMIDAN JEFFERY DAGLE Pacific Northwest National Laboratory NASPI Presentation (October 23, 2014) November 3, 2014 b.amidan@pnnl.gov 1 What is Big Data? Any collection

More information

2012 SNIA Analytics and Big Data Summit. Insert Your Company Name. All Rights Reserved.

2012 SNIA Analytics and Big Data Summit. Insert Your Company Name. All Rights Reserved. A Working Definition of Big Data Data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time. Wikipedia, 4/26/2011

More information

The Importance of Secure Analytics & AI

The Importance of Secure Analytics & AI The Importance of 2014 Secure Analytics & AI Rick Hutley Program Director, Professor of Practice, Data Science, University of the Pacific Email: rhutley@pacific.edu Vice President of IoT, Cisco Systems

More information

A STUDY ON SNA: MEASURE AVERAGE DEGREE AND AVERAGE WEIGHTED DEGREE OF KNOWLEDGE DIFFUSION IN GEPHI

A STUDY ON SNA: MEASURE AVERAGE DEGREE AND AVERAGE WEIGHTED DEGREE OF KNOWLEDGE DIFFUSION IN GEPHI A STUDY ON SNA: MEASURE AVERAGE DEGREE AND AVERAGE WEIGHTED DEGREE OF KNOWLEDGE DIFFUSION IN GEPHI Ayyappan.G 1, Dr.C.Nalini 2, Dr.A.Kumaravel 3 Research Scholar, Department of CSE, Bharath University,Chennai

More information

Is Machine Learning the future of the Business Intelligence?

Is Machine Learning the future of the Business Intelligence? Is Machine Learning the future of the Business Intelligence Fernando IAFRATE : Sr Manager of the BI domain Fernando.iafrate@disney.com Tel : 33 (0)1 64 74 59 81 Mobile : 33 (0)6 81 97 14 26 What is Business

More information

Understanding the Behavior of In-Memory Computing Workloads. Rui Hou Institute of Computing Technology,CAS July 10, 2014

Understanding the Behavior of In-Memory Computing Workloads. Rui Hou Institute of Computing Technology,CAS July 10, 2014 Understanding the Behavior of In-Memory Computing Workloads Rui Hou Institute of Computing Technology,CAS July 10, 2014 Outline Background Methodology Results and Analysis Summary 2 Background The era

More information

Data Analytics and CERN IT Hadoop Service. CERN openlab Technical Workshop CERN, December 2016 Luca Canali, IT-DB

Data Analytics and CERN IT Hadoop Service. CERN openlab Technical Workshop CERN, December 2016 Luca Canali, IT-DB Data Analytics and CERN IT Hadoop Service CERN openlab Technical Workshop CERN, December 2016 Luca Canali, IT-DB 1 Data Analytics at Scale The Challenge When you cannot fit your workload in a desktop Data

More information

ELEC 6910Q Analytics and Systems for Social Media and Big Data Applications

ELEC 6910Q Analytics and Systems for Social Media and Big Data Applications ELEC 6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 1 Intro. to Analytics, Big Data, and this Course Prof. James She james.she@ust.hk 1 About me Assistant Professor, ECE,

More information

Linked Big Data Graph Database and Analytics

Linked Big Data Graph Database and Analytics E6893 Big Data Analytics Lecture 6: Linked Big Data Graph Database and Analytics Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science October 11, 2018 E6893 Big

More information

Relational Data Mining and Web Mining

Relational Data Mining and Web Mining Relational Data Mining and Web Mining Prof. Dr. Daning Hu Department of Informatics University of Zurich Nov 20th, 2012 Outline Introduction: Big Data Relational Data Mining Web Mining Ref Book: Web Intelligence,

More information

INDUSTRIAL VISUALIZATION

INDUSTRIAL VISUALIZATION 1 INDUSTRIAL VISUALIZATION U of T, Info Vis Oct 31, 2016 From CBS News Show 60 Minutes: http://www.cbsnews.com/news/new-search-engine-exposes-the-dark-web/ From Wired.com Contents 2015 Uncharted Software

More information

Intro to Big Data and Hadoop

Intro to Big Data and Hadoop Intro to Big and Hadoop Portions copyright 2001 SAS Institute Inc., Cary, NC, USA. All Rights Reserved. Reproduced with permission of SAS Institute Inc., Cary, NC, USA. SAS Institute Inc. makes no warranties

More information

Data Analytics on a Yelp Data Set. Maitreyi Tata. B.Tech., Gitam University, India, 2015 A REPORT

Data Analytics on a Yelp Data Set. Maitreyi Tata. B.Tech., Gitam University, India, 2015 A REPORT Data Analytics on a Yelp Data Set by Maitreyi Tata B.Tech., Gitam University, India, 2015 A REPORT submitted in partial fulfillment of the requirements for the degree MASTER OF SCIENCE Department of Computer

More information

The ABCs of Big Data Analytics, Bandwidth and Content

The ABCs of Big Data Analytics, Bandwidth and Content White Paper The ABCs of Big Data Analytics, Bandwidth and Content Richard Treadway and Ingo Fuchs, NetApp March 2012 WP-7147 EXECUTIVE SUMMARY Enterprises are entering a new era of scale, where the amount

More information

Optimizing Outcomes in a Connected World: Turning information into insights

Optimizing Outcomes in a Connected World: Turning information into insights Optimizing Outcomes in a Connected World: Turning information into insights Michael Eden Management Brand Executive Central & Eastern Europe Vilnius 18 October 2011 2011 IBM Corporation IBM celebrates

More information

Final Report Evaluating Social Networks as a Medium of Propagation for Real-Time/Location-Based News

Final Report Evaluating Social Networks as a Medium of Propagation for Real-Time/Location-Based News Final Report Evaluating Social Networks as a Medium of Propagation for Real-Time/Location-Based News Mehmet Ozan Kabak, Group Number: 45 December 10, 2012 Abstract This work is concerned with modeling

More information

Common Customer Use Cases in FSI

Common Customer Use Cases in FSI Common Customer Use Cases in FSI 1 Marketing Optimization 2014 2014 MapR MapR Technologies Technologies 2 Fortune 100 Financial Services Company 104M CARD MEMBERS 3 Financial Services: Recommendation Engine

More information

Business Network Analytics

Business Network Analytics Business Network Analytics Sep, 2017 Daning Hu Department of Informatics University of Zurich Business Intelligence Research Group F Schweitzer et al. Science 2009 Research Methods and Goals What Why How

More information

The Structure of Comment Networks

The Structure of Comment Networks The Structure of Comment Networks Vahed Qazvinian Department of EECS University of Michigan Ann Arbor, MI vahed@umich.edu Jafar Adibi Center for Advanced Research PricewaterhouseCoopers LLP San Jose, CA

More information

What about streaming data?

What about streaming data? What about streaming data? 1 The Stream Model Data enters at a rapid rate from one or more input ports Such data are called stream tuples The system cannot store the entire (infinite) stream Distribution

More information

ABOUT THIS TRAINING: This Hadoop training will also prepare you for the Big Data Certification of Cloudera- CCP and CCA.

ABOUT THIS TRAINING: This Hadoop training will also prepare you for the Big Data Certification of Cloudera- CCP and CCA. ABOUT THIS TRAINING: The world of Hadoop and Big Data" can be intimidating - hundreds of different technologies with cryptic names form the Hadoop ecosystem. This comprehensive training has been designed

More information

MapR: Converged Data Pla3orm and Quick Start Solu;ons. Robin Fong Regional Director South East Asia

MapR: Converged Data Pla3orm and Quick Start Solu;ons. Robin Fong Regional Director South East Asia MapR: Converged Data Pla3orm and Quick Start Solu;ons Robin Fong Regional Director South East Asia Who is MapR? MapR is the creator of the top ranked Hadoop NoSQL SQL-on-Hadoop Real Database time streaming

More information

Big Data The Big Story

Big Data The Big Story Big Data The Big Story Jean-Pierre Dijcks Big Data Product Mangement 1 Agenda What is Big Data? Architecting Big Data Building Big Data Solutions Oracle Big Data Appliance and Big Data Connectors Customer

More information

GE Intelligent Platforms. Proficy Historian HD

GE Intelligent Platforms. Proficy Historian HD GE Intelligent Platforms Proficy Historian HD The Industrial Big Data Historian Industrial machines have always issued early warnings, but in an inconsistent way and in a language that people could not

More information

CONNECTING SOCIAL MEDIA TO ECOMMERCE USING MICROBLOGGING AND ARTIFICIAL NEURAL NETWORK

CONNECTING SOCIAL MEDIA TO ECOMMERCE USING MICROBLOGGING AND ARTIFICIAL NEURAL NETWORK CONNECTING SOCIAL MEDIA TO ECOMMERCE USING MICROBLOGGING AND ARTIFICIAL NEURAL NETWORK Ms.S.P.VidhyaPriya 1,B.Gokhila 2, T.Santhiya 3, K.Saranya 4 1 M.E.,Assistant Professor-CSE, Kathir College Of Engineering,

More information

Big Data and Automotive IT. Michael Cafarella University of Michigan September 11, 2013

Big Data and Automotive IT. Michael Cafarella University of Michigan September 11, 2013 Big Data and Automotive IT Michael Cafarella University of Michigan September 11, 2013 A Banner Year for Big Data 2 Big Data Who knows what it means anymore? Big Data Who knows what it means anymore? Associated

More information

Who s Here Today? B2B Social Media: Why?

Who s Here Today? B2B Social Media: Why? Who s Here Today? Agenda B2B Social Media Going Beyond LinkedIn B2B: Why Use Social Media? Best Practices Facebook LinkedIn Twitter Analytics B2B Social Media: Why? B2B Social Media: Why? B2B Social Media:

More information

Computational Challenges for Real-Time Marketing with Large Datasets

Computational Challenges for Real-Time Marketing with Large Datasets Computational Challenges for Real-Time Marketing with Large Datasets Alan Montgomery Associate Professor Carnegie Mellon University Tepper School of Business e-mail: alan.montgomery@cmu.edu web: http://www.andrew.cmu.edu/user/alm3

More information

Advanced Topics in Computer Science 2. Networks, Crowds and Markets. Fall,

Advanced Topics in Computer Science 2. Networks, Crowds and Markets. Fall, Advanced Topics in Computer Science 2 Networks, Crowds and Markets Fall, 2012-2013 Prof. Klara Kedem klara@cs.bgu.ac.il Office 110/37 Office hours by email appointment The textbook Chapters from the book

More information

Big Data Trends to Watch

Big Data Trends to Watch Big Data Trends to Watch Bill Peterson NetApp September, 2012 1 Bill Peterson @thebillp What I hope to accomplish today... ...and avoid this. What is Big Data? Big Data refers to datasets whose volume,

More information

Behavioral Data Mining. Lecture 22: Network Algorithms II Diffusion and Meme Tracking

Behavioral Data Mining. Lecture 22: Network Algorithms II Diffusion and Meme Tracking Behavioral Data Mining Lecture 22: Network Algorithms II Diffusion and Meme Tracking Once upon a time Once upon a time Approx 1 MB Travels ~ 1 hour, B/W = 3k b/s News Today, the total news available from

More information

Cognitive Data Warehouse and Analytics

Cognitive Data Warehouse and Analytics Cognitive Data Warehouse and Analytics Hemant R. Suri, Sr. Offering Manager, Hybrid Data Warehouses, IBM (twitter @hemantrsuri or feel free to reach out to me via LinkedIN!) Over 90% of the world s data

More information

SO YOU WANT TO HIRE A DIGITAL MARKETING AGENCY...

SO YOU WANT TO HIRE A DIGITAL MARKETING AGENCY... SO YOU WANT TO HIRE A DIGITAL MARKETING AGENCY... LET S TALK GOALS. To build a successful partnership with a digital marketing agency, you need to know what you want. Having a clear picture of what your

More information

Transforming Business Needs into Business Value. Path to Agility May 2013

Transforming Business Needs into Business Value. Path to Agility May 2013 Transforming Business Needs into Business Value Path to Agility May 2013 Agile Transformation Professional services career Large scale projects Application development & Integration Project management

More information

A scalable Approach to Sizeindependent

A scalable Approach to Sizeindependent Research Dublin Dept. of Computer Science Rutgers A scalable Approach to Sizeindependent Network Similarity Michele Berlingerio Tina Eliassi- Rad Danai Koutra Christos Faloutsos WIN, September 28 th -

More information

Data Mining Applications with R

Data Mining Applications with R Data Mining Applications with R Yanchang Zhao Senior Data Miner, RDataMining.com, Australia Associate Professor, Yonghua Cen Nanjing University of Science and Technology, China AMSTERDAM BOSTON HEIDELBERG

More information

SAP Big Data. Markus Tempel SAP Big Data and Cloud Analytics Services

SAP Big Data. Markus Tempel SAP Big Data and Cloud Analytics Services SAP Big Data Markus Tempel SAP Big Data and Cloud Analytics Services Is that Big Data? 2015 SAP AG or an SAP affiliate company. All rights reserved. 2 What if you could turn new signals from Big Data into

More information

Data Science Strategies for ReaI-Time Analytics

Data Science Strategies for ReaI-Time Analytics http://www.mif.pg.gda.pl/homepages/jasiu/stud/eim/pdf/22-ind-zastosowania.pdf Data Science Strategies for ReaI-Time Analytics Kirk Borne @KirkDBorne Principal Data Scientist Booz Allen Hamilton http://www.boozallen.com/datascience

More information

Social Prediction: Can we predict users action and emotions?

Social Prediction: Can we predict users action and emotions? Social Prediction: Can we predict users action and emotions? Jie Tang Knowledge Engineering Group Department of Computer Science and Technology Tsinghua University Email: jietang@tsinghua.edu.cn 1 Motivation

More information

Introduction to Stream Processing

Introduction to Stream Processing Introduction to Processing Guido Schmutz DOAG Big Data 2018 20.9.2018 @gschmutz BASEL BERN BRUGG DÜSSELDORF HAMBURG KOPENHAGEN LAUSANNE guidoschmutz.wordpress.com FRANKFURT A.M. FREIBURG I.BR. GENF MÜNCHEN

More information

EMC IT Big Data Analytics Journey. Mahmoud Ghanem Sr. Systems Engineer

EMC IT Big Data Analytics Journey. Mahmoud Ghanem Sr. Systems Engineer EMC IT Big Data Analytics Journey Mahmoud Ghanem Sr. Systems Engineer Agenda 1 2 3 4 5 Introduction To Big Data EMC IT Big Data Journey Marketing Science Lab Use Case Technical Benefits Lessons Learned

More information

Research on Prescriptive Analytics

Research on Prescriptive Analytics Research on Prescriptive Analytics Nov. 2014 Hanmin Jung Head of Dept. of Computer Intelligence Research, KISTI Self-introduction (Google Scholar) 2 2 Self-introduction (Academic Activities) * NABD2015

More information

From Big Data to Fast Data. Sina Sheikholeslami

From Big Data to Fast Data. Sina Sheikholeslami From Big Data to Fast Data Sina Sheikholeslami s.sheikholeslami@digikala.com CEIT GradTalks, Tehran Polytechnic May 29 2017 Overview The War on Big Data Definition The Early Days State-of-the-art Big Data

More information

Social Influence Analysis in Social Networking Big Data: Opportunities and Challenges

Social Influence Analysis in Social Networking Big Data: Opportunities and Challenges ACCEPTED FROM OPEN CALL Social Influence Analysis in Social Networking Big Data: Opportunities and Challenges Sancheng Peng, Guojun Wang, and Dongqing Xie Abstract Social influence analysis has become

More information

BIG DATA TRANSFORMS BUSINESS. Copyright 2013 EMC Corporation. All rights reserved.

BIG DATA TRANSFORMS BUSINESS. Copyright 2013 EMC Corporation. All rights reserved. BIG DATA TRANSFORMS BUSINESS 1 Big Data = Structured+Unstructured Data Internet Of Things Non-Enterprise Information Structured Information In Relational Databases Managed & Unmanaged Unstructured Information

More information

Data Analytics with HPC L01. Introduction What are data analytics, big data, data science,?

Data Analytics with HPC L01. Introduction What are data analytics, big data, data science,? Data Analytics with HPC L01 Introduction What are data analytics, big data, data science,? 1 Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0

More information

Incentivized Sharing in Social Networks

Incentivized Sharing in Social Networks Incentivized Sharing in Social Networks Joseph J. Pfeiffer III Department of Computer Science Purdue University, West Lafayette, IN, USA jpfeiffer@purdue.edu Elena Zheleva LivingSocial Washington, D.C.,

More information

Getting Big Value from Big Data

Getting Big Value from Big Data Getting Big Value from Big Data Expanding Information Architectures To Support Today s Data Research Perspective Sponsored by Aligning Business and IT To Improve Performance Ventana Research 2603 Camino

More information

A Survey on Influence Analysis in Social Networks

A Survey on Influence Analysis in Social Networks A Survey on Influence Analysis in Social Networks Jessie Yin 1 Introduction With growing popularity of Web 2.0, recent years have witnessed wide-spread adoption of rich social media applications such as

More information