Introduction to Analytics and Data Science. Pekka Malo, Assist. Prof. (statistics) Aalto BIZ / Department of Information and Service Economy

Size: px
Start display at page:

Download "Introduction to Analytics and Data Science. Pekka Malo, Assist. Prof. (statistics) Aalto BIZ / Department of Information and Service Economy"

Transcription

1 Introduction to Analytics and Data Science Pekka Malo, Assist. Prof. (statistics) Aalto BIZ / Department of Information and Service Economy

2 2

3 Agenda ADS seminar Administrative information Schedule and speakers From Data to Decision Analytic Thinking Survey of Business Tech Trends Why many Big Data projects fail? Analytics & Data Science as a job Course: Data Science for Business 3

4 ADS -seminar Credit units: 2 Course is compulsory for completing the ADS minor Course workload: 1. Attendance Should participate at least 5 out of 6 lectures 2. Assignment: Data analytics using IBM Watson and Bluemix Details and instructions will be given in the lecture of Sep 21 4

5 Schedule and speakers Sep 7: Pekka Malo (BIZ), Matti Sarvimäki (BIZ) Sep 14: Jari Saramäki (SCI), Pekka Marttinen (SCI) Sep 21: Jouko Puotanen (IBM) talk on IBM Watson and Bluemix Sep 28: Kirsi Virrantaus (ENG), Aki Vehtari (SCI) Oct 5: Salo Ahti (SCI), Pauliina Ilmonen (SCI) Oct 12: Mikko Kurimo (ELEC), Keijo Heljanko (SCI) 5

6 From Data to Decision Analytic Thinking

7 No one has the ability to capture and analyze data from the future. However, there is a way to predict the future using data from the past. It s called predictive analytics, and organizations do it every day. - Tom Davenport, HBR, The best way to predict the future is to learn from failed predictive analytics. - Michael Schrage, HBR,

8 Business Tech Trends Data Science for Business 8

9 Data Analytics is increasingly important Source: The 2015 Nordic survey on Big Data and Hadoop (by SAS Institute and Intel)

10 Evolution of Business Intelligence 91% are investing (2013 survey of Fortune 1000 companies) Business Intelligence Big Data 1.0 Big Data 2.0 Descriptive statistics from data with high information density Strong commercial platforms High volume, high velocity and high variety of information (but lower information density) Focus on building the capabilities for processing large data Support for current operations, improved efficiency Predictive analytics (relations, causal effects) Large sets of data with low information density Focus on creating new ways of doing business, instead of polishing current operations 10

11 120" Ask Google Trends: BI vs. Big Data 100" 80" 60" 40" 20" 0" 1/10/04" 1/10/05" 1/10/06" 1/10/07" 1/10/08" 1/10/09" 1/10/10" 1/10/11" 1/10/12" 1/10/13" 1/10/14" business"intelligence" big"data" Source: Google trends 11

12 Business Intelligence vs. Big Data Analytics Source: Debertoli et al. (2014): Comparing Business Intelligence and Big Data Skills: A Text Mining Study Using Job Advertisements. Business & Information Systems Engineering. 12

13 What do we expect from data and analytics? Insights Speed More sophisticated / granular insights Cost savings 13

14 Primary use case or scenario? Source: The 2015 Nordic survey on Big Data and Hadoop (by SAS Institute and Intel) 14

15 Primary use case or scenario by industry Source: The 2015 Nordic survey on Big Data and Hadoop (by SAS Institute and Intel) 15

16 Adoption and future directions Source: The 2015 Nordic survey on Big Data and Hadoop (by SAS Institute and Intel) 16

17 Big Data in Research Source: Porter et al. (2015): MetaData: BigData Research Evolving Across Disciplines, Players, and Topics. In proceedings of 2015 IEEE International Conference on Big Data.

18 18 Source: Porter et al. (2015)

19 Why many Big Data projects fail? Becoming Analytical Innovator 19

20 Big Data analytics is getting popular but needs to be handled with extreme caution Do you trust your source? Is the data representative of your target group? Data Statistical wizardry Are you aware of the assumptions of your analysis approach? Are your assumptions still true? Monitor! Assumptions Do you have the data scientists who can do this? Do you have managers who understand what is really going on? 20

21 All that glitters is not gold Beware the button effect - Statistically significant answer!= the real, important, answer - Easy answers may be dangerous too Big data has a flavor of dumpster diving - Loads of data out there, but true gold nuggets are rare 21

22 Why many investments in Big Data can fail to pay off? Expectations are often unrealistic - Big data has been heavily hyped Path to value with analytics is getting crowded - More organizations (incl. competitors) are boosting their analytics - The solution you have just found may be already in good use by others Companies don t do a good job with the information (small data) they already have - Don t know how to manage their data and analyze its operations - Not able to respond to the insights they have learned from the data - Management experience overrides data analysis 22

23 Source: Kiron et al. (2014) 23

24 Obstacles in technology adoption 24

25 Learn from Your Analytics Failures While the computational resources and techniques for prediction may be novel and astonishingly powerful, many of the human problems and organizational pathologies appear depressingly familiar. - Michael Schrage (HBR, ) 25

26 What makes an exceptional data scientist? X factor = curiosity Ask a series of questions Accept analytic failures Bigger questions, better insights, more valuable decisions The best way to predict the future is to learn from failed predictive analytics 26

27 The Need for Culture Source: Kiron et al. (2014) 27

28 Data-informed culture ~ the secret sauce Behavior Analytics as part of strategy Collaborative use of data Promotion of best analytics practices Invest into tech, talent, training Management is data-driven Values Data is a core asset Analytics is a mandate driven by executives Decision-making norms Analytical insights guide future strategy Analytics outweighs management experience in key issues Openness to ideas, ability to challenge current practices Data-informed culture 28

29 Does your company have a datainformed culture? Few questions to check your company s current standing Do operational decision makers have clear business rules*? Do you create and revise business rules on the basis of business analytics? Do you provide high quality coaching to employees who make decisions on a regular basis? Have business leaders accepted ownership of key data? Do you have an undisputed source of performance data? Do individuals receive daily feedback on performance? 29 Source: Ross et al. (2013) *) business rule = mechanism for specifying what actions should be taken in a given circumstance

30 Adopting a data-informed culture is often difficult Highly regarded, high performing data skeptics Problem: Afraid that performance measures will not capture the true value of their contribution Remedy: Need to be involved early and have a say in development of metrics Source: Bladt, J. and Filbin, B., HBR, May 16,

31 Adopting a data-informed culture is often difficult The data antagonists Problem Coworkers love them, but they are really afraid to be spotted out! Sometimes nice ideas, but a lot of shooting in the dark Remedy? Source: Bladt, J. and Filbin, B., HBR, May 16,

32 Three Levels of Analytical Organizations What is your current standing? Source: Kiron et al. (2014) 32

33 Is your company Analytical Innovator? 5 questions to assess your current standing 1 Is my organization open to new ideas that challenge current practice? 2 Does my organization view data as a core asset? 3 Is senior management driving the organization to become more data-driven and analytical? 4 Is my organization using analytical insights to guide the strategy? 5 Are we willing to let analytics help change the way we do business? 33

34 For most, the journey is just beginning Source: Kiron et al. (2014) 34

35 How are pacesetters different from dabblers? Staying on the leading edge (IBM Tech Trends) They partner more creatively 2.0x as likely to engage Citizen Developers for training 2.6x as likely to engage Start-ups for steering IT direction They act on insight, not instinct They combine technologies Source: Raising the game: The IBM Business Tech Trends Study,

36 Data Scientist: The Sexiest Job of the 21st Century - Davenport & Patil, Harvard Business Review 36

37 The different types of data scientists 4 data scientist clusters & skills groups ML = Machine Learning OR= Operations Research Source: Harris, H. D, Murphy, S. P. & Vaisman, M. (2013), Analyzing the analyzers. An introspective survey of data scientist and their work. O Reilly, available at: ng-the-analyzers.csp 37

38 Average salaries by employer type 388 respondents: Note: bar height ~ number of people;; bar width ~ salary 58.8% US/Canada 26.8% Europe 7.5% Asia 1.8% AU/NZ 3.4% Latin America 1.8% Africa / Middle East Source: KDnuggets 2015 Analytics, Data Mining, Data Science Salary Poll 38

39 Average salaries by Role Source: KDnuggets 2015 Analytics, Data Mining, Data Science Salary Poll 39

40 40

41

42 What Data Scientists Love about their Job? Almost 79% are satisfied Almost one-third find it totally awesome Source: CrowdFlower 2015 Data Scientist Report

43 What s holding data scientists back? Cleaning data takes too much time Poor quality of data Unrealistic schedules Fuzzy project goals Source: CrowdFlower 2015 Data Scientist Report

44 Programming and coding skills are much appreciated Source: CrowdFlower 2015 Data Scientist Report 44

45 Diverse background is important Source: CrowdFlower 2015 Data Scientist Report 45

46 Wishlist things that employers should do better Source: CrowdFlower 2015 Data Scientist Report 46

47 Course: Data Science for Business Overview 47

48 Who should participate? Big Data & Analytics Business people who will be working with data scientists, managing data science-oriented projects, or investing in data-driven ventures Cloud Analysts and developers who will be implementing data science solutions Mobile Social Aspiring future data scientists Offered in collaboration with

49 Learning objectives After completing the course, the students will be able to identify the role of data as a business asset understand the principles of predictive modeling recognize how different data science methods can support business decision-making apply data science solutions in business problems

50 Course topics business strategy CRISP-DM data in motion Data analytic thinking Data Science for Business unsupervised Predictive modeling SVM text mining supervised decision trees classification segmentation model performance Information visualization validation complexity control Evidence and probabilities overfitting 50

51 Course Timeline and Functions Preassignment Assignment 1 Assignment 2 Assignment 3 Assignment 4 Assignment 5 CS 1. Big data in business CS 2. Predictive modeling I CS 3. Statistical vs. practical significance CS 4. Predictive Modeling II CS 5. Representing and mining text CS 6. Stream computing Group meeting 1 Team case Group meeting 2 51

52 Course workload (6cr) Contact'sessions' 1. Lectures'and'tutorials'(132'x'3h'/'week)' 2. Exercise'demos'and'workshops'(2'x'3h'/'week)' Class'preparation' ' Assignments' Team'case'(course'project)' ' 18h' 36h' 12h' ' 48h' 46h' ' 52

53 References Debertoli et al. (2014): Comparing Business Intelligence and Big Data Skills: A Text Mining Study Using Job Advertisements. Business & Information Systems Engineering. Harris, H. D, Murphy, S. P. & Vaisman, M. (2013): Analyzing the analyzers. An introspective survey of data scientist and their work. O Reilly, available at: IBM Center for Applied Insights. (2014): Raising the game, The IBM Business Tech Trends Study. Retrieved from Kiron et al. (2014): The Analytics Mandate, MIT Sloan Management Review, Research report, May, Schrage, M. (2014): Learn from Your Analytics Failures. HBR Blog Network, Sep 3,

54 References Ross, J., Beath, C. and Quadgras, A. (2013): You May Not Need Big Data After All. Harvard Business Review. NewVantage Partners, Big Data Executive Survey 2013: The State of Big Data in the Large Corporate World, September 9, Bladt, J. and Filbin, B. (2014): Who s Afraid of Data-Driven Management? HBR Blog Network, May 16,