Data Mining and Knowledge Discovery in Large Databases

Size: px
Start display at page:

Download "Data Mining and Knowledge Discovery in Large Databases"

Transcription

1 Outline We are drowning in data, but we are starving for knowledge Part 2: Clustering - Hierarchical Clustering - Divisive Clustering - Density based Clustering Data Mining and Knowledge Discovery in Large Databases Erik Kropat University of the Bundeswehr Munich, Germany

2 Why Data Mining? Companies are collecting massive amounts of data on customers, operations, and the competitive landscape. Firms can gain a competitive advantage from these data But, there is far too much data Online shops record purchase behaviours for millions of customers (sometimes with hundreds features for each customer) Phone companies keep info on 100 s of millions of accounts (each with thousands of transactions) Databases can often be hundreds of terabytes in size (this will be peanuts in the future).

3 Why Data Mining? We are drowning in data, but we are starving for knowledge (John Naisbitt)

4 Knowledge Discovery in Large Databases Process of finding valuable and useful patterns in datasets

5 Analysis of data sets from businesses & investments finance & economics science & technology bioinformatics telecommunication or more complex data sets multimedia & sound images & video automatic news analysis social media analysis.

6 What are the data sources? Consumer data Credit card transactions data Supermarket transactions data Loyalty cards Web server logs Social media Variety of features Name and address History of shopping and purchases Demographics Credit rating Quality & market share of products

7 Business Intelligence Customer Data Analytics & Market Analysis customer segmentation market basket analysis target marketing geo-marketing cross-selling / up-selling customer relation management

8 Market Basket Analysis Cross Selling

9 Key Tasks Decision Trees Assocation Rule Learning Neural Networks Digital Forensics Automatic Derivation of Ontologies

10 Retail Customer segmentation Identify purchase patterns of typical customers Targeted advertisement, costumized pricing, cost-effective promotions Market basket analysis Identify the purchase behaviour of groups of customers Sales promotions Identify likely responders to sales promotions

11 Banking Credit rating Given a large number names, which persons are likely to default on their credit cards? Fraud detection Credit card fraud detection Network intrusion detection

12 Telecommunications Companies are facing an escalating competition and are forced to aggressively market special pricing programs aimed at retaining existing customers and attracting new ones. Call detail record analysis Identify customer segments with similar use patterns. Offer attractive pricing and feature promotions. Customer loyalty / customer churn management Some customers repeatedly churn (switch providers). Identify those who are likely to switch or who are likely to remain loyal. Companies can target their spending on customers who will produce the most profit. Set pricing strategies in a highly competitive market.

13 Big Data is Big Business Companies are using their data sets to aim their services and products with increasing precision. Business Intelligence SAP AG is a German global software corporation that provides enterprise software applications. SAP AG is one of the largest enterprise software companies. In October 2007, SAP AG announced a $6.8 billion deal to acquire Business Objects. Since 2009 Business Objects is a division of SAP AG instead of a separate company.

14 Outline

15 Outline Part 1: Introduction - What is Data Mining? - Examples Part 2: Formal Concept Analysis - Contexts and Concepts - Concept Lattices Part 3: Clustering - Hierarchical Clustering - Partitional Clustering - Fuzzy Clustering - Graph Based Clustering Part 4: Classification - k-th Nearest Neighbors - Support Vector Machines Part 5: Spatial Data Mining - DBSCAN - Density & Connectivity Part 6: Regulatory Networks - Eco-Finance Networks - Gene-Environment Networks

16 Questions? For more information after today me at