Big data is hard. Top 3 Challenges To Adopting Big Data

Size: px
Start display at page:

Download "Big data is hard. Top 3 Challenges To Adopting Big Data"

Transcription

1

2 Big data is hard Top 3 Challenges To Adopting Big Data

3

4 Traditionally, analytics have been over pre-defined structures Data characteristics: Sales Questions answered with BI and visualizations: Customer Product

5 Petabytes To innovate, new types of data and analytics are needed Data characteristics: Sales Questions from exploratory analytics: Customer Product Data complexity: variety and velocity

6 Two Approaches to Analytics Top-Down How can we make it happen? Theory Hypothesis Observation Confirmation What happened? Descriptive Analytics Why did it happen? Diagnostic Analytics What will happen? Predictive Analytics Prescriptive Analytics Theory Hypothesis Pattern Observation

7 7 Traditional business analytics process 1. Start with end-user requirements to identify desired reports and analysis 2. Define corresponding database schema and queries 3. Identify the required data sources 4. Create a Extract-Transform-Load (ETL) pipeline to extract required data (curation) and transform it to target schema ( schema-on-write ) 5. Create reports. Analyze data Dedicated ETL tools (e.g. SSIS) Relational ETL pipeline Queries LOB Applications Defined schema Results All data not immediately required is discarded or archived

8 8 New big data thinking: All data has value All data has potential value Data hoarding No defined schema stored in native format Schema is imposed and transformations are done at query time (schema-on-read). Apps and users interpret the data as they see fit Iterate Gather data from all sources Store indefinitely Analyze See results

9 The Microsoft Data Platform Capabilities Transform + analyze Data Visualize + decide Capture + manage

10

11 Azure Data Platform Hadoop Apps SQL MPP/APS ios/ Android On-Premises Cloud Services storage blob File Data MPLS ExpressRoute Worker Role storage table Cortana Analytics Suite Enterprise Data Transactional Data Log Data IOT On-Premises VPN Device Data Management Gateway VPN Gateway SQL Data Sync Data Management Service Logic Apps Azure Batch Data Factory storage queue HDInsight (Hadoop) Azure Azure Data Data Lake Lake Azure SQL Database Azure SQL Data Warehouse Machine Learning PowerBI Stream Data EventHub Azure Data Catalogue MySQL Database Device Data Cloud Gateway DocDB Stream Analytics

12 Introducing Microsoft Azure Data Lake Microsoft Azure Data Lake Analytics Service HDInsight U-SQL YARN HDFS Store

13 Product Details Microsoft Azure Data Lake Analytics Service HDInsight U-SQL Azure Data Lake store Azure Data Lake analytics service Azure HDInsight YARN HDFS Store

14 Introducing Azure Data Lake Store Analytics Service U-SQL HDInsight No fixed limits file size (PB file sizes) Designed for diversity of analytic workloads Accessible to all HDFS compliant analytic applications (Hortonworks, Cloudera, MapR) YARN HDFS Store Managed, monitored, and supported by Microsoft Enterprise grade features around security, compliance & management

15 Azure Data Lake Analytics Service Microsoft Azure Data Lake Distributed analytics service Analytics Service U-SQL YARN HDFS Store HDInsight Dynamically scales to meet your business needs Productive day one with industry leading development tools (for novices & experts) Analytics over all data (unstructured, semistructured, structured) U-SQL: simple and familiar, easily extensible Hive coming soon Built on open standards (YARN)

16 Azure HDInsight becomes key part of Data Lake Microsoft Azure Data Lake Analytics Service HDInsight U-SQL YARN HDFS Store Microsoft s cloud Hadoop offering 100% open source Apache Hadoop Fully managed and supported by Microsoft Spark, Hive, Pig, Storm, HBase Up and running in minutes with no hardware.net and Java skills Deep integration to Visual Studio 99.9% Enterprise Service Level Agreement Use Windows or Linux

17 Azure HDInsight Includes Spark Microsoft Azure Data Lake Analytics Service HDInsight U-SQL YARN HDFS Single execution model for multiple tasks (SQL queries, streaming, machine learning, and graph) Processing up to 100x faster performance Developer friendly (Java, Python, Scala) BI tool of choice (Power BI, Tabelau, Qlik, SAP) Notebook experience (Jupyter/iPython, Zeppelin) Store

18 Azure HDInsight Includes Storm Microsoft Azure Data Lake Analytics Service HDInsight U-SQL YARN HDFS Consumes millions of real-time events from a scalable event broker (ie. Apache Kafka, Azure Event Hub) Performs time-sensitive computation Output to persistent stores, dashboards or devices Customizable with Java +.NET Deeply integrated to Visual Studio Store

19 Azure HDInsight Includes HBase Microsoft Azure Data Lake Columnar, NoSQL database Analytics Service U-SQL HDInsight Runs on top of the Hadoop Distributed File System (HDFS) Provides flexibility in that new columns can be added to column families at any time YARN HDFS Store

20 20 ADL Store: Ingress Data can be ingested into Azure Data Lake Store from a variety of sources SQL Apache Flume Azure SQL DB Server logs Built-in copy service Azure SQL DW Table Storage Azure tables Azure Data Factory Apache Sqoop ADL Store.NET SDK JavaScript CLI Azure Portal Azure PowerShell Azure Storage Blobs On-premises databases Azure Event Hub Custom programs

21 ADL Store: Egress Data can be exported from Azure Data Lake Store into numerous targets/sinks SQL Azure SQL DB Built-in copy service Azure Storage Blobs Azure SQL DW Table Storage Azure Tables Azure Data Factory Apache Sqoop ADL Store.NET SDK JavaScript CLI Azure Portal Azure PowerShell On-premises databases Custom programs 21

22

23 Get Started Sign up

24 Learn More

25