Big Data Initiatives in China: Opportunities and Challenges

Similar documents
China AI and Big Data Talent Assessment

Deloitte School of Analytics. Demystifying Data Science: Leveraging this phenomenon to drive your organisation forward

Post Graduate Program in BIG DATA ENGINEERING. In association with 11 MONTHS ONLINE

Transforming Analytics with Cloudera Data Science WorkBench

BIG DATA PROCESSING A DEEP DIVE IN HADOOP/SPARK & AZURE SQL DW

ABOUT THIS TRAINING: This Hadoop training will also prepare you for the Big Data Certification of Cloudera- CCP and CCA.

5th Annual. Cloudera, Inc. All rights reserved.

Course Content. The main purpose of the course is to give students the ability plan and implement big data workflows on HDInsight.

20775A: Performing Data Engineering on Microsoft HD Insight

20775A: Performing Data Engineering on Microsoft HD Insight

20775 Performing Data Engineering on Microsoft HD Insight

Powered by Tech Mahindra MAKE IT BIG WITH BIG DATA ANALYTICS

Powered by. Tech Mahindra MAKE IT BIG WITH BIG DATA ANALYTICS

20775: Performing Data Engineering on Microsoft HD Insight

BIG DATA AND HADOOP DEVELOPER

Introduction to Big Data(Hadoop) Eco-System The Modern Data Platform for Innovation and Business Transformation

Powered by. Tech Mahindra MAKE IT BIG WITH BIG DATA ANALYTICS

Introduction to Research at Noah s Ark Lab. Noah s Ark Lab Huawei Technologies Co. Ltd.

Apache Spark 2.0 GA. The General Engine for Modern Analytic Use Cases. Cloudera, Inc. All rights reserved.

AI Use cases and Requirements for telecom network. China Mobile

MATLAB 汽车大数据分析平台的构建及应用

IBM Analytics Unleash the power of data with Apache Spark

Preface About the Book

Machine Learning and Analytics. Machine Learning. Data Lake Analytics. HDInsight (Hadoop, Spark, Storm, HBase Managed Clusters) Stream Analytics

BIG WITH BIG DATA ANALYTICS

Powered by Tech Mahindra MAKE IT BIG WITH BIG DATA ANALYTICS

MR TIGER KIU. Leading New ICT, Building A Better Connected World

BIG WITH BIG DATA ANALYTICS

Optimal Infrastructure for Big Data

Azure ML Data Camp. Ivan Kosyakov MTC Architect, Ph.D. Microsoft Technology Centers Microsoft Technology Centers. Experience the Microsoft Cloud

Azure Data Analytics & Machine Learning Seminar. Daire Cunningham: BI Practice Area Manager

Common Customer Use Cases in FSI

BIG WITH BIG DATA ANALYTICS

MapR: Solution for Customer Production Success

SAP Predictive Analytics Suite

ARCHITECTURES ADVANCED ANALYTICS & IOT. Presented by: Orion Gebremedhin. Marc Lobree. Director of Technology, Data & Analytics

E-guide Hadoop Big Data Platforms Buyer s Guide part 1

Construction of Regional Logistics Information Platform Based on Cloud Computing

Big Data Application Engineer/ Developer. Specialization in Apache Spark, Kafka, Airflow, HBase

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

Managing explosion of data. Cloudera, Inc. All rights reserved.

Rotating to the New. How can Manufacturing Companies in China Thrive in the Digital Age. March 2018

Intermodal Freight Transportation in China.

Accelerating Your Big Data Analytics. Jeff Healey, Director Product Marketing, HPE Vertica

1% + 99% = AI Popularization

BIG DATA and DATA SCIENCE

Context. The NEW data services from UST Global UST GLOBAL - A UNIQUE PARTNER. UST Global Data Services March 2018!1

2016 China s Internet Consumption Finance Market Research Report.

Architecture Overview for Data Analytics Deployments

Big Data & Artificial Intelligence ----How to Achieve Accurate Sales

Modernizing Your Data Warehouse with Azure

Microsoft Azure Essentials

Insights to HDInsight

Hadoop Course Content

VICE PRESIDENT, ARCHITECTURE GENERAL MANAGER, AI PRODUCTS GROUP - INTEL

Simplifying the Process of Uploading and Extracting Data from Apache Hadoop

Spark and Hadoop Perfect Together

Bringing the Power of SAS to Hadoop Title

ADVANCED ANALYTICS & IOT ARCHITECTURES

Data Analytics for Semiconductor Manufacturing The MathWorks, Inc. 1

Big Data Foundation. 2 Days Classroom Training PHILIPPINES :: MALAYSIA :: VIETNAM :: SINGAPORE :: INDIA

Intro to Big Data and Hadoop

Big Data Introduction

DATA SCIENCE: HYPE AND REALITY PATRICK HALL

Official Recruitment Partner of Tech Mahindra MAKE IT BIG WITH BIG DATA ANALYTICS. Powered by.

Hadoop and Analytics at CERN IT CERN IT-DB

New Big Data Solutions and Opportunities for DB Workloads

Big Data in Urban Power Distribution and Consumption Systems. Dr. Dongxia ZHANG 2016 IERE CLP-RI Hong Kong Workshop November 2016

Leveraging smart meter data for electric utilities:

How to build and deploy machine learning projects

Leveraging smart meter data for electric utilities:

Cloudera Data Science and Machine Learning. Robin Harrison, Account Executive David Kemp, Systems Engineer. Cloudera, Inc. All rights reserved.

IBM SPSS & Apache Spark

ZHANG Xin. National Center for Climate Change Strategy and International Cooperation

Building Enterprise OLAP on Hadoop for Financial Services Industry

Data Science, realizing the Hype Cycle. Luigi Di Rito, Director Data Science Team, SAP Center of Excellence

Operational Hadoop and the Lambda Architecture for Streaming Data

SAP Machine Learning for Hadoop. Customer

Research on the Framework and Data Fusion of an Energy Big-data Platform

AI Solutions and Use Cases Up Close Dolly Wu, Vice President/GM Alfie Lew, Solution Architect

Active Analytics Overview

Digital Transformation 2.0

This document (including, without limitation, any product roadmap or statement of direction data) illustrates the planned testing, release and

The Internet of Everything and the Research on Big Data. Angelo E. M. Ciarlini Research Head, Brazil R&D Center

Official Recruitment Partner of Tech Mahindra MAKE IT BIG WITH BIG DATA ANALYTICS. Powered by.

Jun Pei. 09/ /2009 Bachelor in Management Science and Engineering

Data Analytics. Nagesh Madhwal Client Solutions Director, Consulting, Southeast Asia, Dell EMC

Frontiers and Trends SHANGHAI: SHENZHEN: BEIJING: 30 th March Shanghai Tower. 8 th April St. Regis. 12 th April Beijing Marriott Hotel City Wall

KnowledgeENTERPRISE FAST TRACK YOUR ACCESS TO BIG DATA WITH ANGOSS ADVANCED ANALYTICS ON SPARK. Advanced Analytics on Spark BROCHURE

MapR Pentaho Business Solutions

Towards a Big Data-as-a-Service for Legislative Research in the National Assembly Library of Korea. Data Convergence Analysis Division

Design Your Strategy for Digital Transformation with SAP S/4HANA. Allen Li, SAP Greater China July 25, 2016

Big data is hard. Top 3 Challenges To Adopting Big Data

Official Recruitment Partner of Tech Mahindra MAKE IT BIG WITH BIG DATA ANALYTICS. Powered by.

Official Recruitment Partner of Tech Mahindra MAKE IT BIG WITH BIG DATA ANALYTICS. Powered by.

Potential for Savings in China s Government Energy Efficiency Procurement Program: Preliminary Findings

Knowledge Discovery and Data Mining

The Global Market for Intelligent Video Analytics

Official Recruitment Partner of Tech Mahindra MAKE IT BIG WITH BIG DATA ANALYTICS. Powered by.

Redefine Big Data: EMC Data Lake in Action. Andrea Prosperi Systems Engineer

Transcription:

Big Data Initiatives in China: Opportunities and Challenges Joshua Zhexue Huang Distinguished Professor Director of Big Data Institute College of Computer Science and Software Engineering Shenzhen University

Agenda 1. Recent Development of Big Data in China 2. Key Initiatives, Challenges and Opportunities 3. Research and Applications at Big Data Institute, Shenzhen University

What is Big Data? Big data is a term for data sets that are so large or complex that traditional data processing applications are inadequate to deal with them (Wikipedia). Big data often refers to the use of predictive analytics, user behavior analytics, or certain other advanced data analytics methods that extract value from data, and seldom to a particular size of data set.

Big Data Term and Popularity Big Data term was coined in 1998 by John R. Mashey, Chief Scientist of SGI The term then referred to data size in Gigabytes which will cause stress on infrastructure. On MARCH 29, 2012, Obama Administration announced Big Data Research and Development Initiative and $200 million to invest on big data, which made Big Data popular.

Recent Development of Big Data in China - China NSF funded key projects (2010) Massive data mining on cloud computing (2013) Big data oriented machine learning theory and methods (2014) Challenging research problems in big data technology and applications(8 projects) (2015) Five projects on big data (2016) More projects funded in information science and management areas

Recent Development of Big Data in China In August of 2012, Chinese Academy of Sciences started a strategic pilot project (1.3 billion in 5 years) Sensing China oriented next generation information Technologies A subproject on big data Research and development of key technologies for sea and cloud data systems 中国科学院图册 V 百科

Recent Development of Big Data in China In 2016, Ministry of Science and Technology of China started a special program on Cloud computing and big data which will accomplish 12 tasks in four areas with 400 millions RMB Cloud platform and big data infrastructure Data driven new software on cloud service model Big data analytics, applications and Human like intelligence Cloud convergence of Perceptual cognition and human machine interaction

Recent Development of Big Data in China -Ministry of Education of China 85 universities set up a new major on data science and big data technology Some major universities set up special schools, faculties and research institutes on data science and big data Tsinghua University:Tsinghua-Qingdao Data Science Institute Peking University: Beijing University Big Data Technology Inst Fudan University: School of Data Science, Sun Yat-Sen University:School of Data and Computer Science Shenzhen University: Big Data Institute

Recent Development of Big Data in China Local governments set up special organizations to promote big data Beijing: Beijing Institute of Big Data Research Guangdong Province: Big Data Bureau Shanghai: Shanghai Data Exchange Center Shenzhen: Shenzhen Research Institute of Big Data, Chinese University of Hong Kong (Shenzhen)

Recent Development of Big Data in China -Industry Big Internet Companies are the leaders in big data development and applications. They are also big data owners. Baidu, Alibaba, Tencent (BAT) All industry sectors are interested in big data Technology companies, e.g., Huawei, ZTE Telecommunications, e.g., China Mobile, China Unicom Banks and Insurance companies Manufacturing companies E-commerce companies Logistics service companies

Big Data Market in China 0.1 billion compound annual growth rate

Big data: a national strategy A decision was made to implement a national strategy for big data At the Third Plenary Session of the 18th Central Committee of the CPC in October 2015. The 13th Five-year Plan (2016-2020) further defined that big data is fundamental strategic resources to be developed and utilized. National big data centers and platforms will be established. Key technologies, hardware and software will be innovated and developed, including data collection, storage, cleansing, analysis, mining, visualization, security and privacy protection.

Implementation Measures The State Council issued the action outline to promote the development of large data in 2015. In January 2016, The National Development and Reform Commission issued a notice on organizing the implementation of major projects to promote the development of big data, supporting projects in four areas: Pilot projects on big data applications Big data sharing Big data infrastructure development Big data standards and exchange systems

Agenda 1. Recent Development of Big Data in China 2. Key Initiatives, Challenges and Opportunities 3. Research and Applications at Big Data Institute, Shenzhen University

Initiatives to Develop Innovation Driven Economy in China Encourage young people to start their own business and pursue innovation (Mass entrepreneurship and innovation ) Development of big data Internet + action plan Cloud computing service development Internet of Things (including wireless Internet) Artificial Intelligence Made in China 2015 (advanced manufacturing) Internet +

Directions Data science disciplines Key technology development Big data platforms Key applications Data resource development Data sharing and open data Human resource training for big data

Internet + Manufacturing AI Manufacturing procurement Design Customer Service Intelligent warehouse retail Transportation

Technological Challenges Storage cloud storage Communication 4G, 5G Processing cleansing, integration Analysis capability, efficiency Mining methods, tools, platforms Energy consumption

Application Challenges Lack of clear business requirements Lack of successful pilots Data availability and data sharing Data security and privacy ROI on big data applications Infrastructure Skills and human resources

Opportunities: Big Data Industry Chain Telecom Retail Finance Manufacturing Internet Smart Grid E-commerce Logistics Smart City

Agenda 1. Recent Development of Big Data in China 2. Key Initiatives, Challenges and Opportunities 3. Research and Applications at Big Data Institute, Shenzhen University

Shenzhen Shenzhen

China s first Special Economic Zone (SEZ) Neighboring to Hong Kong Area: 2050 km 2 A major city in South China Population (2014): 11 million Shenzhen University The fourth largest city in GDP in China, GDP per capita in USD: 25,038 GDP Growth (2015): 8.9% Xichong Beach Shenzhen Bay Bridge Night View of Shennan Road East

A public university established in 1983. The fastest growing university intop 100 Universitiesin China. 26 schools (colleges) 57 undergraduate programs, 70 master's programs 3 doctorate programs. Shenzhen University 34,000 full-time students 27,000 undergraduates, 6,000 postgraduates 1,500 international students. Lake Wenshan South pavilion of the school library

Big Data Institute, Shenzhen University Established in 2014 20 research staff 30 students Computer Science Building Three organizations International PhD students Institute Corridor

Faculty Members

Data Center

Internet + Manufacturing accumulates big data AI Manufacturing procurement Design Customer Service Intelligent warehouse retail Transportation

Research Problems 1 2 n-4 n-3 n-2 n-1 n f1 f2 f3 f4 f5 Thousands of features Curse of dimensionality 1. Mixed data 2. Noise/missing value 3. Correlation 4. Unbalance 5. Subspace property 6. Uninformative Millions of records Challenge of Big Data Matrix

Big Data Analytics Big data refers to the use of predictive analytics, user behavior analytics, or certain other advanced data analytics methods that extract value from data.

MapReduce Programming(Divide-and- Conquer) Programming (Map) Master node (Reduce) file file file file file node node node node node output File 文件划 partition

MapReduce Iteration K-means Pipeline implementation M R M R M R M R M R M R M R M R M R M R M R M R Input Data????? Map process Assign objects to clusters Reduce process Recompute cluster centers C o n v e r g e? output

MapReduce limitation Decision Tree It is difficult to implement recursive algorithm like decision trees in MapReduce

Spark RDD Computing Model RDD is a matrix.

RDD Divide-and-Conquer

Asymptotic Ensemble Learning Framework

Randomization of Data Blocks Before randomization After randomization

Asymptotic Ensemble Learning Results Learning result from none randomized data blocks Learning result from none randomized data blocks

Advantage of Asymptotic Ensemble Learning Sampling without replacement Sampling data blocks instead records increases sampling efficiency Learning partial data(10-20%) to approach the result learnt from the whole data. Significantly reduce computation load Scalability,learning TB or PB data

Integrated Big Data Analysis Platform

Key Technologies Workflow Engine Cloud Computing Engine Algorithm Library Big Data Analytics Open API Cloud Storage

Distributed Machine Learning Algorithm Libraries MapReduce Clustering Classification Regression Association K-Means K-Modes W-K-Means EWKM Decision Tree Random Forests LDA Logistic Regression Random Forest Regression FP-Growth Spark 1. Machine Learning Mllib 2. Graph Analysis GraphX 3. Data streams Dstream 4. QuerySpark SQL

Analytical Workflow

Manufacturing Big Data Application --Product batch quality problem monitoring system Visualization Impala 数据分析引擎 Applications Vis 数据可视化引擎 xxx xx 引擎 Application Layer Data analysis R 数据挖掘 Hive 数据仓库 Analytics Storm 实时流计算 Spark 数据流处理 Data Warehouse Data cleansing and integration Central DB Local quality data Sqoop 数据迁移 ETL Flume 数据收集工具 Cluster Environment Kettle ETL 工具 HDFS Map/Reduce Runtime System Supl 1 Supl 2 Supl n Fac 1 Fac 2 Fac n Platform Layer Data Layer

大数据分析一体化平台 - 应用展示

Manufacturing Big Data Application --Product batch quality problem monitoring system 10 Year Product quality monitoring period 50M+ No. of products monitored 2015 Huawei President award 30000+ Factories 1PB+ Data 80%+ Report Accuracy 100+ Development Team 50+ Products 0% Missing Rate

Thank You!!! Questions?