Analytics Building Blocks

Similar documents
Knowledge Discovery and Data Mining

DLT AnalyticsStack. Powering big data, analytics and data science strategies for government agencies

Data Science: The Big #SQLServerUserGroupDubai

Enable Business Transformation In Banking & Financial Services Organizations through Connected Devices. Arvind Radhakrishnen.

Elements of a Successful Marketing Plan Part 1. Inbound Marketing Strategies

Modern Internet marketing in the context of online product management

Advanced Topics in Computer Science 2. Networks, Crowds and Markets. Fall,

Agenda. The Power of Social Data: Transforming Big Data into Decisions. 1. Data and Decisions. 2. Amazon as Data Refinery

The disruptive power of big data

Accelerating Your Big Data Analytics. Jeff Healey, Director Product Marketing, HPE Vertica

Turn Data into Business Value

From Information to Insight: The Big Value of Big Data. Faire Ann Co Marketing Manager, Information Management Software, ASEAN

Machine Learning Data Management Platforms

Microsoft Azure Essentials

DATA ROBOTICS 1 REPLY

Chapter 6. E-commerce Marketing and Advertising

E-guide Hadoop Big Data Platforms Buyer s Guide part 1

Lecture 1: What is Data Mining?

Splunk This! - Bringing Natural Language Processing To Splunk

White Paper: VANTIQ Competitive Landscape

Aurélie Pericchi SSP APS Laurent Marzouk Data Insight & Cloud Architect

Who s Here Today? B2B Social Media: Why?

See for more details!

CHATBOTS & INTELLIGENT VIRTUAL ASSISTANTS

The Art of Optimizing Media Asset Management

Big Data The Big Story

GE Intelligent Platforms. Proficy Historian HD

BLOCKCHAIN CLOUD SERVICE. Integrate Your Business Network with the Blockchain Platform

Oracle Autonomous Data Warehouse Cloud

Harnessing Big Data for Social Media Intelligence: South African Case Study

Personalized Shopping App with Customer Feedback Mining and Secure Transaction

Learning Objectives. Learning Objectives 17/03/2016. Chapter 15 The Internet: Digital and Social Media

CS5412: THE CLOUD VALUE PROPOSITION

Analytics is the discovery and communication of meaningful patterns in data.

IoT, IoE, M2M Alphabet Soup or are we creating a Smarter Planet?

HOW MOBILE. TRANSACTIONS Speaker images KILLING SOCIAL FOR DIGITAL COMMERCE. Sponsored by

COURSE SYLLABUS & OUTLINE

A brief history of fraud. Physical theft. Sale of fake and stolen payment cards. Identity theft and application fraud

European Journal of Science and Theology, October 2014, Vol.10, Suppl.1, BIG DATA ANALYSIS. Andrej Trnka *

The Media Industry at the Digital Crossroads Artificial Intelligence to the Rescue?

Analytics with Intelligent Edge Sergey Serebryakov Research Engineer IEEE COMMUNICATIONS SOCIETY

Oracle Autonomous Data Warehouse Cloud

An Introduction to Social Analytics: Concepts and Methods

BUS 168 Chapter 6 - E-commerce Marketing Concepts: Social, Mobile, Local

4/16/2015. Mobile Recruiting: Active and Passive Recruiting Strategies. Lori Furnell. Elmer Mobley

Influencer Communities. Influencer Communities. Influencers are having many different conversations

The Economics of E-commerce and Technology. The Nature of Technology Industries

China Edge Ltd. Plan. Engage. Deliver. China Market Services for Luxury Retail & Hospitality

About The CMO Survey. Mission. Survey operation. Sponsoring organizations

SOCIAL MEDIA PLATFORMS. Best ones too use for Acumen animation studios

Social Intelligence Report Adobe Digital Index Q2 2015

Social Media Is More Than a Popularity Contest

BIZZLOC! - Locate Your Business Group Web Mining Project

Data: Foundation Of Digital Transformation

The Information Economy. The Nature of Information Goods

Data Strategy: How to Handle the New Data Integration Challenges. Edgar de Groot

(Big) Data in Healthcare A Clinician s Perspective International Symposium on Data-Driven Healthcare, 29 Aug 2017

Starting up an Online Business

An AI/ML-Enabled Defense

Genomic Data Is Going Google. Ask Bigger Biological Questions

Why Big Data Matters? Speaker: Paras Doshi

MARKETING BENCHMARKS 7,000+ from. Businesses

KnowledgeENTERPRISE FAST TRACK YOUR ACCESS TO BIG DATA WITH ANGOSS ADVANCED ANALYTICS ON SPARK. Advanced Analytics on Spark BROCHURE

The Age of Intelligent Data Systems: An Introduction with Application Examples. Paulo Cortez (ALGORITMI R&D Centre, University of Minho)

Is Machine Learning the future of the Business Intelligence?

Welcome to. enterprise-class big data and financial a. Putting big data and advanced analytics to work in financial services.

The Internet of Things Platform

Big Data Analytics and Artificial Intelligence

Real-time Streaming Insight & Time Series Data Analytic For Smart Retail

Managing the Cognitive Transformation. Paul Harmon

SOA, Web 2.0, and Web Services

INSIGHTS & BIG DATA. Data Science as a Service BIG DATA ANALYTICS

Sample Exam. INFS 2019: Intro to e-business 30 marks 2h

A Framework for Analyzing Twitter to Detect Community Crime Activity

Edge Analytics for IoT Device Intelligence

Branding. KDLH Promotion 1. Branding. Wow, you are pretty big. Building Your Image into the Community. Building Your Image into the Community

The Land Code. FIG-Working Week 2017 Helsinki, 30 May Dr. Daniel Steudler

SYSPRO Product Roadmap Q Version 03

Machina Research White Paper for ABO DATA. Data aware platforms deliver a differentiated service in M2M, IoT and Big Data

Digital Transformation and the IIoT

Chapter 9. Business Intelligence Systems

SharePoint & The Cloud

Big Data: Essential Elements to a Successful Modernization Strategy

Increase Value and Reduce Total Cost of Ownership and Complexity with Oracle PaaS

TEQIP Short Term course on Big Data 7-11 Aug 2017

Big Data Analytics: Future Architectures, Skills and Roadmaps for the CIO

Things you should know about The Industrial Internet of Things (Industrial IoT) Sohail Nazari, PhD, Peng ANDRITZ AUTOMATION

TEXT MINING APPROACH TO EXTRACT KNOWLEDGE FROM SOCIAL MEDIA DATA TO ENHANCE BUSINESS INTELLIGENCE

The good, the bad and the ugly - Celebrating Libraries

Smart City Monitor: Enabling digital transformation by Intelligent Sustainable Systems

Gartner's Top 10 Strategic Technology Trends for 2014

Multi-Touchpoint Marketing

IMPORTANCE OF BIG DATA IN MARITIME TRANSPORT

Predicting Yelp Ratings From Business and User Characteristics

nallian Data Sharing as the New Normal? Value Chain Collaboration in the Cloud Alex SAS FORUM 2015

Power Productivity on LinkedIn. Hubspot & Jill Konrath

CPET 581 E-Commerce & Business Technologies

2014 SAP SE or an SAP affiliate company. All rights reserved. 1

Transcription:

http://poloclub.gatech.edu/cse6242 CSE6242 / CX4242: Data & Visual Analytics Analytics Building Blocks Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech Partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko, Christos Faloutsos

What is Data & Visual Analytics? 2

What is Data & Visual Analytics? No formal definition! 2

What is Data & Visual Analytics? No formal definition! Polo s definition: the interdisciplinary science of combining computation techniques and interactive visualization to transform and model data to aid discovery, decision making, etc. 2

What are the ingredients? 3

What are the ingredients? Need to worry (a lot) about: storage, complex system design, scalability of algorithms, visualization techniques, interaction techniques, statistical tests, etc. Wasn t this complex before this big data era. Why? 3

http://spanning.com/blog/choosing-between-storage-based-and-unlimited-storage-for-cloud-data-backup/ 4

What is big data? Why care? ( big data is buzz word, so is IoT - Internet of Things) Many companies businesses are based on big data (Google, Facebook, Amazon, Apple, Symantec, LinkedIn, and many more) Web search Rank webpages (PageRank algorithm) Predict what you re going to type Advertisement (e.g., on Facebook) Infer users interest; show relevant ads Infer what you like, based on what your friends like Recommendation systems (e.g., Netflix, Pandora, Amazon) Online education Health IT: patient records (EMR) Bio and Chemical modeling: Finance Cybersecruity Internet of Things (IoT)

Good news! Many jobs! Most companies are looking for data scientists The data scientist role is critical for organizations looking to extract insight from information assets for big data initiatives and requires a broad combination of skills that may be fulfilled better as a team - Gartner (http://www.gartner.com/it-glossary/data-scientist) Breadth of knowledge is important. This course helps you learn some important skills.

Analytics Building Blocks

Collection Cleaning Integration Analysis Visualization Presentation Dissemination

Building blocks, not steps Collection Cleaning Integration Analysis Visualization Presentation Dissemination Can skip some Can go back (two-way street) Examples Data types inform visualization design Data informs choice of algorithms Visualization informs data cleaning (dirty data) Visualization informs algorithm design (user finds that results don t make sense)

How big data affects the process? Collection Cleaning Integration Analysis Visualization Presentation Dissemination The Vs of big data (3Vs, 4Vs, now 7Vs) Volume: billions, petabytes are common Velocity: think Twitter, fraud detection, etc. Variety: text (webpages), video (youtube) Veracity: uncertainty of data Variability Visualization Value http://www.ibmbigdatahub.com/infographic/four-vs-big-data http://dataconomy.com/seven-vs-big-data/

Gartner's 2016 Hype Cycle http://www.gartner.com/newsroom/id/3412017 https://en.wikipedia.org/wiki/hype_cycle

Artificial Intelligence

We re in the 3rd wave of AI boom Two AI winters before https://en.wikipedia.org/wiki/history_of_artificial_intelligence We should be cautiously optimistic (Polo s motto)

AI Safety

Schedule Collection Cleaning Integration Analysis Visualization Presentation Dissemination

Two Example Projects from Polo Club

Apolo Graph Exploration: Machine Learning + Visualization Apolo: Making Sense of Large Network Data by Combining Rich User Interaction and Machine Learning. Duen Horng (Polo) Chau, Aniket Kittur, Jason I. Hong, Christos Faloutsos. CHI 2011. 18

19

Beautiful Hairball Death Star Spaghetti 19

Finding More Relevant Nodes HCI Paper Citation network Data Mining Paper 20

Finding More Relevant Nodes HCI Paper Citation network Data Mining Paper 20

Finding More Relevant Nodes HCI Paper Citation network Data Mining Paper Apolo uses guilt-by-association (Belief Propagation) 20

Demo: Mapping the Sensemaking Literature Nodes: 80k papers from Google Scholar (node size: #citation) Edges: 150k citations 21

Key Ideas (Recap) Specify exemplars Find other relevant nodes (BP) 23

What did Apolo go through? Collection Scrape Google Scholar. No API :( Cleaning Integration Analysis Visualization Presentation Design inference algorithm (Which nodes to show next?) Interactive visualization you just saw Paper, talks, lectures Dissemination

Apolo: Making Sense of Large Network Data by Combining Rich User Interaction and Machine Learning. Duen Horng (Polo) Chau, Aniket Kittur, Jason I. Hong, Christos Faloutsos. ACM Conference on Human Factors in Computing Systems (CHI) 2011. May 7-12, 2011. 25

NetProbe: Fraud Detection in Online Auction NetProbe: A Fast and Scalable System for Fraud Detection in Online Auction Networks. Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. WWW 2007

NetProbe: The Problem Find bad sellers (fraudsters) on ebay who don t deliver their items $$$ Seller Buyer Auction fraud is #3 online crime in 2010 source: www.ic3.gov 27

28

NetProbe: Key Ideas Fraudsters fabricate their reputation by trading with their accomplices Fake transactions form near bipartite cores How to detect them? 29

NetProbe: Key Ideas Use Belief Propagation F A H Fraudster Accomplice Darker means more likely Honest 30

NetProbe: Main Results 31

32

32

Belgian Police 32

33

What did NetProbe go through? Collection Scraping (built a scraper / crawler ) Cleaning Integration Analysis Design detection algorithm Visualization Presentation Dissemination Paper, talks, lectures Not released

NetProbe: A Fast and Scalable System for Fraud Detection in Online Auction Networks. Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang, Christos Faloutsos. International Conference on World Wide Web (WWW) 2007. May 8-12, 2007. Banff, Alberta, Canada. Pages 201-210. 35

Homework 1 (out next week; tasks subject to change) Collection Cleaning Integration Analysis Visualization Presentation Dissemination Simple End-to-end analysis Collect data using API) Movies (Actors, directors, related movies, etc.) Store in SQLite database Transform data to movie-movie network Analyze, using SQL queries (e.g., create graph s degree distribution) Visualize, using Gephi Describe your discoveries