Siim Karus 2011 Fall

Size: px
Start display at page:

Download "Siim Karus 2011 Fall"

Transcription

1 Siim Karus 2011 Fall

2 Business Intelligence Data Acquisition Data Analyisis Results presentation

3 Definition Relation to Data Mining Themes of BI, history Applications of BI

4 The ability to apprehend the interrelationships of presented facts in such a way as to guide action towards a desired goal. Hans Peter Luhn (1958, IBM)

5 Improving Business Insight A broad category of applications and technologies for gathering, storing, analyzing, sharing and providing access to data to help enterprise users make better business decisions. Gartner

6 all

7

8

9 Seek Profitable Customers Identify Problem Areas Correct Data During ETL Understand Customer Needs Descriptive Analysis Detect and Prevent Fraud Predictive Analysis Anticipate Customer Churn Performance Monitoring Business Activity Monitoring Build Effective Marketing Campaigns Predict Sales & Inventory

10 efine Data dentify Task et Results

11 Sources (choice of features, process or content centric) Extract, transform and load (ETL vs ELT) Storage (Hadoop, StreamInsight, BigData)

12 Measurement tools Termometer Clock Storage Tablets Books Database Systems Extraction tools SQL queries ETL systems

13

14

15

16

17

18

19

20 Generating an unified model (Data Warehouse) Cleaning data Merging data Applying aggregations Evaluating data Splitting data Data Transformations (ETL vs ELT)

21 Hadoop BigData StreamInsight

22 Know the right questions to ask Beware of feedback loops

23 Know the right questions to ask Beware of feedback loops

24 Know the right questions to ask Beware of feedback loops

25 Know the right questions to ask Beware of feedback loops

26 Know the right questions to ask Beware of feedback loops Do not forget to model missing data

27 Visual Mining (cubes, dimensions, partitions) Data Mining (choice of algorithms) Visual Data Mining (learning from user interactions with results) Social BI Interpretation

28 Visual Learning Tablets Pivot Machine Learning Excel Statistics Suites Analysis Suites Visual BI Analysis Suites Social BI People

29

30

31 Regular (Simple) Dimensions (Star schema) Referenced Dimensions (Snowflake schema) Fact (Many-to-Many) Dimensions

32

33 Simple aggregations Min Max Average Sum Count Complex aggregations Difference with previous period Conditional sum or count Calculation types Precomputed vs computed during runtime Over visible nodes vs over all nodes

34 Split cube by dimension values Partitioning: Different data sources Different storage policies (e.g. Operative non-cached data ROLAP partition and historic cached data MOLAP partition) Read-only vs. Read-write partitions

35 Subsets of cubes (not necessarily subcubes) Purpose: show cube data relevant to different stakeholders

36

37

38 Learning from users analysis-decision patterns Raw data Data Representation Domain Expert High dimensionality data Domain knowledge User Feedback Dimensionality Reduction User Low dimensionality data Visualisation evaluation Visualisation

39 recaptcha Prediction markets Social media sites as data source

40 We do not model causality We only model dependence

41 Choice of relevant baseline Random guess Mean Most common class No change Measures of performance Precision and recall Lift Cumulative Gain Mean Relative Error Mean Absolute Error Interpretation mistakes Post hoc ergo propter hoc Cum hoc ergo propter hoc Affirming the consequent Confirmation bias Confounding Uncorrelated does not imply independent Third-cause fallacy Selection bias Sampling bias

42 Basic reporting (what, how, data overload) Events, reactions Estimation fallacy (self-reference) Decision Making Process Improvement

43 Visualisation Tablets Drawings Reporting Suites Process Automation Ticket Systems Tracking Systems Notification Alerts Business Process Management Solutions

44

45

46 Report types Forms Tables Charts Gauges Scorecards Interactions Drill down Drill through Linked views Writeback Delivery On-demand Runtime Cached Subscription based Published Security Dimension filters Consumer specific reports

47 Upper management Scorecards Drilltrough Publishing Middle management Dashboards Drilldown On-demand Lower management Tables Forms Runtime

48 KPI based Value Goal Trend Status Event based Threshold Event Delivery Dashboard Mail SMS Push-notifications

49 Dashboards Scorecards Scenario analysis Bottleneck identification Getting the relevant data to people who need it when they need it.

50

51

52

53

54 Confusing correlation with causality

55 Confusing correlation with causality Showing too little or too much data Forgetting about drill actions Ignoring analysis results Overreacting or misinterpreting

56

57

58 Name one BI task you solve daily? Describe your Data sources Analytic process Decision support

59 Armor your aircrafts Download the datasheet about aircraft past battle damage report Choose, which aircraft parts to armor ( Enter your specification to the simulator (you will get to run it and re-design your craft at practice) Keep in mind that Armor increases weight and reduces mobility Armor increses part s durability approximately 3 times

60 Complete self-referential aptitude test:

61 /en/html/lu2_learningobject3.html