Machine Learning and Analytics. Machine Learning. Data Lake Analytics. HDInsight (Hadoop, Spark, Storm, HBase Managed Clusters) Stream Analytics

Size: px
Start display at page:

Download "Machine Learning and Analytics. Machine Learning. Data Lake Analytics. HDInsight (Hadoop, Spark, Storm, HBase Managed Clusters) Stream Analytics"

Transcription

1

2

3

4 微软云上数据平台概括

5 Data Sources Information Management Big Data Stores Machine Learning and Analytics Intelligence People Data Factory Data Lake Store Machine Learning Cognitive Services Data Catalog SQL Data Warehouse Data Lake Analytics Bot Framework Web Apps Event Hubs HDInsight (Hadoop, Spark, Storm, HBase Managed Clusters) Cortana Apps Mobile Bots Stream Analytics Dashboards & Visualizations Sensors and devices Power BI Automated Systems Data Intelligence Action

6 From data to decisions and actions Descriptive [Reports] Diagnostic [Interactive Dashboards] Predictive [Machine Learning] Prescriptive [Recommendations & Automation] What happened? Why did it happen? What will happen? What should I do? Insight

7 A highly scalable, distributed, parallel file system in the cloud specifically designed to work with a variety of big data analytics workloads ADL Analytics HDInsight Batch Batch Script SQL NoSQL In-Memory Predictive U-SQL Map Reduce Pig Hive HBase Spark R Server Devices Social LOB Applications Web Video Sensors Azure Data Lake Store Relational Clickstream

8 关于 Azure HDInsight

9 Microsoft Hadoop Stack Azure HDInsight Analytics Machine Learning Storage Local (HDFS) or Cloud (Azure Blob/Azure Data Lake Store)

10 Azure HDInsight Hadoop and Spark as a Service on Azure Fully-managed Hadoop and Spark for the cloud 100% Open Source Hortonworks data platform Clusters up and running in minutes Supported by Microsoft with industry s best SLA Familiar BI tools for analysis Open source notebooks for interactive data science 63% lower TCO than deploying Hadoop onpremise* *IDC study The Business Value and TCO Advantage of Apache Hadoop in the Cloud with Microsoft Azure HDInsight

11 ODBC

12 Multi-User Authentication Kerberos Azure Active Directory Perimeter Level Security Virtual Network Network Security (i.e. Firewalls) Gateway Service Authorization using Apache Ranger Hive policies HBase policies File and Folder level ACLS on ADLS Data Security Rest supported On both Azure Storage Blob and ADLS

13 HDInsight 案例分享

14

15

16 关于 HDInsight - Hive

17

18 Ad-Hoc Drill-Down BI Tools: Tableau, Excel Continuous ingestion from operational DB Slowly changing dimensions Multidimensional Analytics MDX Tools Excel Legend Existing Development Emerging Platform Core SQL Engine Connectivity

19

20 SDK, PowerShell Hadoop cluster JDBC, ODBC, Visual Studio, Hue, Ambari Interactive Hive cluster (new)

21 演示 : HDInsight cluster & Hive

22 基于 HDInsight Hive 的企业 数据仓库

23

24

25

26 Storage choice Always on cluster (Persistent) Local HDFS, Azure Blob, Azure Data Lake Store Cluster as a service (On demand) Azure Blob, Azure Data Lake Store Job Scheduling Oozie Azure Data Factory Data persistence after cluster deletion Metadata persistence after cluster deletion N/A N/A Azure Blob, Azure Data Lake Store Azure SQL Billing Billing for entire time cluster is up Billing per job Pay only for time the cluster was actually used Since data & metadata is persisted, experience is as if the cluster was never deleted

27

28 Optimization Summary Choose from dozens of VMs and scale out capability to increase parallelism Choose Tez execution Engine Avoid reading entire partitions by breaking files into pieces Columnar format supported by Hive which also allows you to use ACID and LLAP Enables Hive to process 1024 rows at one time to make execution faster

29

30 演示 : Query Authoring Tools 演示 : 100GB query with Batch

31 Azure official website and SDKs for Azure in China to official information, solution, documentation, Developer Notes for Azure in China Applications to developer differences between Global and China Azure 1RMB Trial: Azure Marketplace in China: Microsoft 云科技公众号 Azure 云助手手机 App

32 Azure 中国官网 提供最新产品与解决方案信息, 技术文档, 以及 SDKs 下载 Azure 应用程序开发说明 概述了海外与中国区服务开发人员需要注意的区别 申请一元试用, 即刻体验 Azure 服务 : Azure 镜像市场 : Microsoft 云科技公众号 Azure 云助手手机 App

33

34

35

36 顶级项目 Apache Kylin, 中国唯一的 Apache 顶级开源项目, 核心开发者及贡献者都在中国 行业认可 连续两年荣获 InfoWorld 最佳开源大数据工具奖, 今年更是与 Google TensorFlow 一起获得该奖 用户认可 国内外超过 100 多家大型公司正式使用 Kylin 作为大数据分析平台解决方案, 分布各个行业

37

38

39 e C C b

40

41 Kylin 的 O(1) 算法使得查询性能与数据集大小无关

42 超大数据, 超高性能, 超高并发

43 大规模数据分析, 无需编码

44

45

46 Kylin server HDinsight Virtual network Blob Storage Resources Group Azure Resource Manager

47 Azure: 成熟的云计算平台 HDInsight: 自动伸缩 Power BI: 自助式可视化 BI Apache Kylin: 高性能 + 高并发 + 标准 SQL

48

49

50

51

52