Data Warehousing (The Need, Importance & the Big Picture)

Size: px
Start display at page:

Download "Data Warehousing (The Need, Importance & the Big Picture)"

Transcription

1 Data Warehousing (The Need, Importance & the Big Picture) Naveed Iqbal, Assistant Professor NUCES, Islamabad Campus (Lecture Slides Week # 1)

2 Why this Course? The World is changing / (in fact changed) Either change or Be left behind. Missing the opportunities or going in the wrong direction has prevented us from growing. What is the right direction? Harnessing the data, in the knowledge driven economy. Doing what can t be or difficult to automate. NUCES, Islamabad Campus Data Warehousing - Fall

3 The Need of the Time Drowning in data AND/BUT starving for information. Knowledge is power BUT Intelligence is absolute/super power. NUCES, Islamabad Campus Data Warehousing - Fall

4 The Need of the Time POWER ($/ ) Intelligence Knowledge Information Data NUCES, Islamabad Campus Data Warehousing - Fall

5 Evolution of Information Systems NUCES, Islamabad Campus Data Warehousing - Fall

6 NUCES, Islamabad Campus Data Warehousing - Fall

7 NUCES, Islamabad Campus Data Warehousing - Fall

8 Business Intelligence NUCES, Islamabad Campus Data Warehousing - Fall

9 NUCES, Islamabad Campus Data Warehousing - Fall

10 Visualization NUCES, Islamabad Campus Data Warehousing - Fall

11 Date Warehousing the big picture Data (Tier 0) Data Warehouse Server (Tier 1) OLAP Servers (Tier 2) Clients (Tier 3) Semistructured Sources www data Meta Data MOLAP Query/Reporting IT Users Archived data Operational Data Bases Data sources Extract Transform Load (ETL) Data Warehouse Data Marts ROLAP Analysis Data Mining Tools Business Users Business Users NUCES, Islamabad Campus Data Warehousing - Fall

12 NUCES, Islamabad Campus Data Warehousing - Fall

13 Approach of the Course Develop an understanding of the underlying RDBMS concepts. Apply these concepts to VLDB / DSS environments and understand where and why they break down? Expose the differences between RDBMS and Data Warehouse in the context of VLDB. Provide the basics of DSS tools such as OLAP, Data Mining and demonstrate their applications. Demonstrate the application of DSS concepts and limitations of the OLTP concepts through lab exercises. NUCES, Islamabad Campus Data Warehousing - Fall

14 Summary of the Course Introduction & Background Extract-Transform-Load (ETL) Normalization & De-Normalization Dimensional Modeling Online Analytical Processing (OLAP) Data Quality Management (DQM) Need for Speed (Parallelism, Join and Indexing Techniques) DWH Implementation Steps Complete Implementation Case Study Lab and Tool Usage NUCES, Islamabad Campus Data Warehousing - Fall

15 Books Reference Books Golfarelli & Rizzi, Data Warehouse Design Modern Principles and Methodoligies, McGRAW-Hill W. H. Inmon, Building the Data Warehouse, John Wiley & Sons Inc., NY R. Kimball, The Data Warehouse Toolkit, John Wiley & Sons Inc., NY A. Abdullah, Data Warehousing for Beginners: Concepts & Issues. Paulraj Ponniah, Data Warehousing Fundamentals, John Wiley & Sons Inc., NY... NUCES, Islamabad Campus Data Warehousing - Fall

16 Course Execution Plan Lecturing / Discussions Lab Work + Tutorials Assignments / Case Studies Projects Marks Breakup: Mid-I: 12% Quizzes: 6% Mid-II: 13% Assignments/Case Study: 9% Final*: 40% Projects*: 20% * Mandatory (Missing means F) NUCES, Islamabad Campus Data Warehousing - Fall

17 Code of Conduct Regularity Attendance criteria as per university policy Punctuality No entry after 5 minutes from class start time (N/A for habitual late comers) Discipline ABSOLUTLY NO COMPROMISE Positive Attitude High Level of Class Participation No Plagiarism, Cheating No Change in Deadlines No Usage of Mobile / Other Devices NUCES, Islamabad Campus Data Warehousing - Fall

18 Scenario 1 ABC Pvt Ltd is a company with branches at Karachi, Quetta, Peshawar and Lahore. The Sales Manager wants quarterly sales report. Each branch has a separate operational system. NUCES, Islamabad Campus Data Warehousing - Fall

19 Scenario 1 : ABC Pvt Ltd. Karachi Quetta Peshawar Sales per item type per branch for first quarter. Sales Manager Lahore NUCES, Islamabad Campus Data Warehousing - Fall

20 Solution 1:ABC Pvt Ltd. Extract sales information from each database. Store the information in a common repository at a single site. NUCES, Islamabad Campus Data Warehousing - Fall

21 Solution 1:ABC Pvt Ltd. Karachi Report Quetta Data Warehouse Query & Analysis tools Sales Manager Peshawar Lahore NUCES, Islamabad Campus Data Warehousing - Fall

22 Scenario 2 One Stop Shopping Super Market has huge operational database. Whenever Executives wants some report, the OLTP system becomes slow and data entry operators have to wait for some time. NUCES, Islamabad Campus Data Warehousing - Fall

23 Scenario 2 : One Stop Shopping Data Entry Operator Report Wait Operational Database Management Data Entry Operator NUCES, Islamabad Campus Data Warehousing - Fall

24 Solution 2 Extract data needed for analysis from operational database. Store it in warehouse. Refresh warehouse at regular interval so that it contains up to date information for analysis. Warehouse will contain data with historical perspective. NUCES, Islamabad Campus Data Warehousing - Fall

25 Solution 2 Data Entry Operator Report Transaction Operational database Extract data Data Warehouse Manager Data Entry Operator NUCES, Islamabad Campus Data Warehousing - Fall

26 Scenario 3 Cakes & Cookies is a small, new company. President of the company wants his company should grow. He needs information so that he can make correct decisions. NUCES, Islamabad Campus Data Warehousing - Fall

27 Solution 3 Improve the quality of data before loading it into the warehouse. Perform data cleaning and transformation before loading the data. Use query analysis tools to support adhoc queries. NUCES, Islamabad Campus Data Warehousing - Fall

28 Solution 3 Expansio n Data Warehouse Query and Analysis tool sales President time Improvemen t NUCES, Islamabad Campus Data Warehousing - Fall

29 Case Study AFCO Foods & Beverages is a new company which produces dairy, bread and meat products with production unit located at Gujranwala. There products are sold in all the region of Pakistan. They have sales units at provincial Head Quarters. The President of the company wants sales information. NUCES, Islamabad Campus Data Warehousing - Fall

30 Sales Information Report: The number of units sold. 113 Report: The number of units sold over time January February March April NUCES, Islamabad Campus Data Warehousing - Fall

31 Sales Information Report : The number of items sold for each product with time Jan Feb Mar Apr Wheat Bread 6 17 Cheese Swiss Rolls Product NUCES, Islamabad Campus Data Warehousing - Fall

32 Time Sales Information Report: The number of items sold in each City for each product with time Karachi Wheat Bread Jan Feb Mar Apr Cheese Swiss Rolls Lahore Wheat Bread 3 7 Cheese 3 8 Swiss Rolls Product NUCES, Islamabad Campus Data Warehousing - Fall

33 Sales Information Report: The number of items sold and income in each region for each product with time. Jan Feb Mar Apr Rs U Rs U Rs U Rs U Karachi Wheat Bread Cheese Swiss Rolls Lahore Wheat Bread Cheese Swiss Rolls NUCES, Islamabad Campus Data Warehousing - Fall

34 Data Warehousing includes Building Data Warehouse Online Analysis/Analytical Processing (OLAP) Presentation Cleaning,Selection & Integration RDBMS Presentation Flat File Warehouse & OLAP server Client NUCES, Islamabad Campus Data Warehousing - Fall