Modernizing Your Data Warehouse with Azure

Size: px
Start display at page:

Download "Modernizing Your Data Warehouse with Azure"

Transcription

1 Modernizing Your Data Warehouse with Azure Big data. Small data. All data. Christian Coté

2 S P O N S O R S

3 The traditional BI Environment

4 The traditional data warehouse data warehousing has reached the most significant tipping point since its inception. The biggest, possibly most elaborate data management system in IT is changing. Gartner, The State of Data Warehousing in 2012

5 The traditional data warehouse 2 Real time data 1 Increasing 1data Increasing data 3 New data sources volumes volumes and types 4 Cloud-born data

6 Life isn t about waiting for the storm to pass It s about learning to dance in the rain.

7 The modern data warehouse

8 Microsoft s modern data warehouse SQL Server 2014 PDW Microsoft Azure HDInsight Data Platform

9

10

11 Fully managed relational data warehouse-as-a-service The first elastic cloud data warehouse with enterprise-grade capabilities Support your smallest to largest data sets

12

13 In-memory performance In-memory Columnstore for next-generation performance Columnstore index representation

14 Near real-time insights Real-time with complex event processing Event Sources Event Targets

15

16

17 Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process data within a tolerable elapsed time - Wikipedia

18 What is Big Data? Many Options Variability

19 Volume What is Big Data? Exabytes (10E18) Petabytes (10E15) Terabytes (10E12) Gigabytes (10E9) Social Sentiment Click Stream Mobile Advertising ERP / CRM Internet of things Sensors / RFID / Devices WEB 2.0 ecommerce Collaboration Payables Contacts Payroll Deal Tracking Inventory Sales Pipeline Digital Marketing Search Marketing Web Logs Recommendations Wikis / Blogs Audio / Video Log Files Spatial & GPS Coordinates Data Market Feeds egov Feeds Weather Text/Image Velocity - Variety Storage/GB ,000$ ERP / CRM ,000$ WEB $ Internet of things $

20 What is Big Data? Common Scenarios

21 Hadoop Apache Hadoop is for big data Open-source software framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models Designed to scale up from single servers to thousands of machines, each offering local computation and storage

22 Hadoop TRADITIONAL RDBMS HADOOP Data Size Access Updates Structure Integrity Scaling DBA Ratio

23 HDFS Hadoop Distributed File System (HDFS) is a Java-based file system that provides scalable and reliable data storage that is designed to span large clusters of commodity servers. HDFS Database

24 How it works?

25 How it works? Runtime Server Server Server Server

26 Architecture

27 Hadoop Ecosystem Pipeline / workflow (Oozie) Event Pipeline (Flume) Monitoring & Deployment (System Center) PowerShell NoSQL Database (HBase) C#, F#,.NET Scripting (Pig) Graph (Pegasus) Metadata (HCatalog) Query (Hive) Stats processing (RHadoop Distributed Processing (MapReduce) World's Data (Azure Data Marketplace) Distributed Storage (HDFS) Azure Storage Vault (ASV) Machine Learning (Mahout) Query/Scripting (Spark) Active Directory (Security) Data Integration ( ODBC / SQOOP/ REST) Relational (SQL Server) Event Driven Processing Business Intelligence Excel, Power View, SSAS) Legend Red = Core Hadoop Blue = Data processing Purple = Microsoft integration points and value adds Orange = Data Movement Green = Packages

28 What is Hive? A data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis Provides an SQL-Like language called HiveQL to query data Integration between Hadoop and BI and visualization tools

29 What is Pig? Write complex MapReduce jobs using a simple script language (Pig Latin) A platform for analyzing large data sets that consists of highlevel language for expressing data analysis programs Pig translates and compiles complex MapReduce jobs on the fly

30 Data Flow Data Hadoop Analytics

31 Capabilities Extract Load Transform Distributed Compute Predictive Analysis Machine Learning Graph Processing

32 IT infrastructure optimization Legal discovery Social network analysis Traffic flow optimization Web app optimization Churn analysis Natural resource exploration Weather forecasting Healthcare outcomes Fraud detection Life sciences research Advertising analysis Equipment monitoring Smart meter monitoring

33

34 Features and benefits Analyze unstructured data in Excel Combine different types of data with Power Query/Power BI Analyze your data with Power Pivot and Power BI to perform analysis

35 Features and benefits Build a cluster in minutes and tear it down when you re done Optimize cluster-size for time to insight or cost-savings

36 Try HDInsight at Try SQL Server for data warehousing in Microsoft Azure VMs at Try Hortonworks Data Platform for Windows at www. hortonworks.com Try SQL Server 2017 at

37

38 Resources Apache Projects (list with links) Microsoft Azure HDInsight HDInsight Documentation & Tutorials Hortonworks Sandbox 2.2 & Tutorials Cloudera VMs CDH 5.3.x Microsoft JDBC Driver for SQL Server Microsoft Hive ODBC Driver Getting Started with Big Data (MVA) Big Data and Business Analytics Immersion v3.1 (MVA) Introducing Microsoft Azure HDInsight (free e-book)