Data Science: The Big #SQLServerUserGroupDubai

Similar documents
DevSci: Better Software Through Data #KCDC2018

SQLStarter Intro to Data Science. Dave

Sunnie Chung. Cleveland State University

Digital Transformation 2.0

Big data is hard. Top 3 Challenges To Adopting Big Data

Evolution or Revolution: Top Ten Development Trends

BIG DATA PROCESSING A DEEP DIVE IN HADOOP/SPARK & AZURE SQL DW

Great new technologies that can Radically Transform your business!

Big Data Introduction

KnowledgeENTERPRISE FAST TRACK YOUR ACCESS TO BIG DATA WITH ANGOSS ADVANCED ANALYTICS ON SPARK. Advanced Analytics on Spark BROCHURE

Data IBM. Education for our Data Scientists. Emily Plachy, Distinguished Engineer, IBM Global Chief Data Office May 1, 2017

Tech Trends. Big Data, IOT, Security, Machine Learning, Search engines... Anant Asthana: github.com/anantasty

Preface About the Book

Microsoft Big Data. Solution Brief

Azure ML Data Camp. Ivan Kosyakov MTC Architect, Ph.D. Microsoft Technology Centers Microsoft Technology Centers. Experience the Microsoft Cloud

Introduction to Big Data(Hadoop) Eco-System The Modern Data Platform for Innovation and Business Transformation

INTRODUCTION TO R FOR DATA SCIENCE WITH R FOR DATA SCIENCE DATA SCIENCE ESSENTIALS INTRODUCTION TO PYTHON FOR DATA SCIENCE. Azure Machine Learning

Powered by Tech Mahindra MAKE IT BIG WITH BIG DATA ANALYTICS

Powered by. Tech Mahindra MAKE IT BIG WITH BIG DATA ANALYTICS

The Importance of good data management and Power BI

How to Build Your Data Ecosystem with Tableau on AWS

Analytics With Hadoop. SAS and Cloudera Starter Services: Visual Analytics and Visual Statistics

Market for BI & Data Analytics

What is Search-Powered Analytics?

Analytics Landscape and Careers

DATA ROBOTICS 1 REPLY


Staffing Services Portfolio Advisory Fulfilment

COMP9321 Web Application Engineering

EMBED ANALYTICS EVERYWHERE Tomáš Jurczyk

Digital Transformation in Automotive. Wilfried Reimann Head of IT Enterprise Architecture & Innovation

Official Recruitment Partner of Tech Mahindra MAKE IT BIG WITH BIG DATA ANALYTICS. Powered by.

Powered by. Tech Mahindra MAKE IT BIG WITH BIG DATA ANALYTICS

Official Recruitment Partner of Tech Mahindra MAKE IT BIG WITH BIG DATA ANALYTICS. Powered by.

The Rise of Engineering-Driven Analytics. Richard Rovner VP Marketing

Intro to Big Data and Hadoop

ABOUT THIS TRAINING: This Hadoop training will also prepare you for the Big Data Certification of Cloudera- CCP and CCA.

Digital Disruption. Democratization of AI. AI Opportunities

KNOWLEDGENT WHITE PAPER. The Ever-Evolving Artificial Intelligence and Machine Learning Ecosystem

BIG WITH BIG DATA ANALYTICS

Becoming an Intelligent Enterprise to Make Money Out of Data

IBM Analytics Unleash the power of data with Apache Spark

BIG Data Analytics AWS Training

The Essential Eight technologies Artificial intelligence

The Importance of Secure Analytics & AI

Advanced Analytics in Azure

Advanced analytics at your hands

The Alpine Data Platform

THE IMPORTANCE OF END USER DATA PREPARATION

Simplifying the Process of Uploading and Extracting Data from Apache Hadoop

Career Center. Resources for Exploring Careers. in Data Science. Explore the Variety of Career Paths with These Example Fields & Roles

Enabling Self-Service Analytics Across The UDA With Teradata AppCenter

Official Recruitment Partner of Tech Mahindra MAKE IT BIG WITH BIG DATA ANALYTICS. Powered by.

Official Recruitment Partner of Tech Mahindra MAKE IT BIG WITH BIG DATA ANALYTICS. Powered by.

The Rise of Engineering-Driven Analytics

Transforming Analytics with Cloudera Data Science WorkBench

GET MORE VALUE OUT OF BIG DATA

Official Recruitment Partner of Tech Mahindra MAKE IT BIG WITH BIG DATA ANALYTICS. Powered by.

Official Recruitment Partner of Tech Mahindra MAKE IT BIG WITH BIG DATA ANALYTICS. Powered by.

SAP Analytics Cloud All Analytics. All Users. One Product

Building Your Big Data Team

Databricks Cloud. A Primer

Bringing the Power of SAS to Hadoop Title

BIG WITH BIG DATA ANALYTICS

Big Data Application Engineer/ Developer. Specialization in Apache Spark, Kafka, Airflow, HBase

Integrating Artificial Intelligence and Simulation Modeling

Graham Palmer EMEA Sales Director Intel Corporation. Copyright 2015 Intel Corporation

Data Science, realizing the Hype Cycle. Luigi Di Rito, Director Data Science Team, SAP Center of Excellence

Architecting for Real- Time Big Data Analytics. Robert Winters

Application Integrator Automate Any Application

BIG WITH BIG DATA ANALYTICS

Deloitte School of Analytics. Demystifying Data Science: Leveraging this phenomenon to drive your organisation forward

Simplifying Data Engineering to Accelerate Innovation

Research on Intelligent Management Unified Service Platform of Internet Plus Agriculture

Skills requirements for future jobs

Data Analytics for Semiconductor Manufacturing The MathWorks, Inc. 1

Venkata Reddy Konasani

Analytics in the Cloud, Cross Functional Teams, and Apache Hadoop is not a Thing Ryan Packer, Bank of New Zealand

This document (including, without limitation, any product roadmap or statement of direction data) illustrates the planned testing, release and

Hybrid Data Management

YOUR PARTNER FOR INDUSTRY 4.0

Powered by Tech Mahindra MAKE IT BIG WITH BIG DATA ANALYTICS

The Hunt for the Data Scientist GIEWEE HAMMOND MSCAN, MSCAS LEAD DATA SCIENTIST, ARAMCO SERVICES COMPANY

Business is being transformed by three trends

Confidential

BIG DATA and DATA SCIENCE

Building Data Teams. Business Data Science Use Cases

ULI UK On the Cusp of Change. Laetitia Cailleteau

The Media Industry at the Digital Crossroads Artificial Intelligence to the Rescue?

Ventana Research Big Data and Information Management Research in 2017

Berkeley Data Analytics Stack (BDAS) Overview

ROBOTICS FOR CUSTOMERS

Microsoft Developer Day

Joining the disruption in the Asset Management Industry How to evaluate new technologies and implement new ideas like a start up company

The Internet of Things Wind Turbine Predictive Analytics. Fluitec Wind s Tribo-Analytics System Predicting Time-to-Failure

Welcome! 2013 SAP AG or an SAP affiliate company. All rights reserved.

From GIS to Location Intelligence

Course Content. The main purpose of the course is to give students the ability plan and implement big data workflows on HDInsight.

Internship opportunities at Check Point!

The Internet of Things and Machine Learning

Transcription:

Data Science: The Big Picture @MatthewRenze #SQLServerUserGroupDubai

Job Postings for Data Scientists

Top-paying Tech Skills Skill 2016 Change Skill 2016 Change Source: Dice Salary Survey 2017

What is data science? Why is it important? Where is this all going?

What is data science?

Computer Science Data Science Math and Statistics Domain Knowledge

What Is a Data Scientist? Performs data science More than a scientist More than an analyst More than a developer

What skills are necessary?

Data Science Skills Programming Working with data Descriptive statistics Data visualization

Data Science Skills Programming Working with data Descriptive statistics Data visualization Statistical modeling Handling Big Data Machine learning Deploying to production

What tools are used?

SQL Excel Python R MySQL Python tools ggplot SQL Server Tableau JavaScript Matplotlib Java PostgreSQL Oracle D3 Homegrown Hive Spark Cloudera Visual Basic MongoDB Hadoop SAS C++ PowerPivot Scala SQLite C Pig RedShift Weka Hbase (EMR) Perl SPSS Teradata Share of Respondents 70% 60% Data Science Tools 50% 40% 30% 20% 10% 0% Tool: language, platform, analytics Source: O Reilly 2015 Data Science Salary Survey

SQL Excel Python R MySQL Python tools ggplot SQL Server Tableau JavaScript Matplotlib Java PostgreSQL Oracle D3 Homegrown Hive Spark Cloudera Visual Basic MongoDB Hadoop SAS C++ PowerPivot Scala SQLite C Pig RedShift Weka Hbase (EMR) Perl SPSS Teradata Share of Respondents 70% 60% Data Science Tools 50% 40% 30% 20% 10% 0% Tool: language, platform, analytics Source: O Reilly 2015 Data Science Salary Survey

SQL Excel Python R MySQL Python tools ggplot SQL Server Tableau JavaScript Matplotlib Java PostgreSQL Oracle D3 Homegrown Hive Spark Cloudera Visual Basic MongoDB Hadoop SAS C++ PowerPivot Scala SQLite C Pig RedShift Weka Hbase (EMR) Perl SPSS Teradata Share of Respondents 70% 60% Data Science Tools 50% 40% 30% 20% 10% 0% Tool: language, platform, analytics Source: O Reilly 2015 Data Science Salary Survey

SQL Excel Python R MySQL Python tools ggplot SQL Server Tableau JavaScript Matplotlib Java PostgreSQL Oracle D3 Homegrown Hive Spark Cloudera Visual Basic MongoDB Hadoop SAS C++ PowerPivot Scala SQLite C Pig RedShift Weka Hbase (EMR) Perl SPSS Teradata Share of Respondents 70% 60% Data Science Tools 50% 40% 30% 20% 10% 0% Tool: language, platform, analytics Source: O Reilly 2015 Data Science Salary Survey

SQL Excel Python R MySQL Python tools ggplot SQL Server Tableau JavaScript Matplotlib Java PostgreSQL Oracle D3 Homegrown Hive Spark Cloudera Visual Basic MongoDB Hadoop SAS C++ PowerPivot Scala SQLite C Pig RedShift Weka Hbase (EMR) Perl SPSS Teradata Share of Respondents 70% 60% Data Science Tools 50% 40% 30% 20% 10% 0% Tool: language, platform, analytics Source: O Reilly 2015 Data Science Salary Survey

How is data science performed?

The Data Science Process Data

The Data Science Process Find a question Data

The Data Science Process Find a question Collect the data Data

The Data Science Process Find a question Collect the data Data Prepare the data

The Data Science Process Find a question Collect the data Data Prepare the data Create a model

The Data Science Process Find a question Collect the data Evaluate the model Data Prepare the data Create a model

The Data Science Process Find a question Deploy the model Collect the data Evaluate the model Data Prepare the data Create a model

The Data Science Process Find a question Deploy the model Collect the data Evaluate the model Data Prepare the data Create a model

The Data Science Process Find a question Iterative process Deploy the model Explore the data Evaluate the model Data Prepare the data Create a model

The Data Science Process Find a question Iterative process Non-sequential Deploy the model Explore the data Evaluate the model Data Prepare the data Create a model

The Data Science Process Find a question Iterative process Deploy the model Explore the data Non-sequential Early termination Evaluate the model Data Prepare the data Create a model

Why is data science important?

Data Analytics

Internet of Things Data Analytics

Internet of Things Data Analytics Big Data

Internet of Things Data Analytics Machine Learning Big Data

Internet of Things Data Analytics Machine Learning Big Data

Trends Past Present Future

Driven by economics Possible by technology

Cost Cost Value

Internet of Things Data Analytics Machine Learning Big Data

Internet of Things Data Analytics Machine Learning Big Data

Data Analysis (The Past)

Collecting, analyzing, and communicating data was difficult, expensive, and slow.

Data Analytics (The Present)

Collecting, analyzing, and communicating data is easy, inexpensive, and fast.

Data-Driven Decision Making (The Future)

Retail Sales Total Products Total Sales Sales by Product Type New Products This Year Annual Sales Comparison by Month Sales per Square Foot by Total Sales Variance and District

Internet Sales Show me sales by gender and marital status. Displaying sum of sales by gender and marital status Marital Status: Married Single Show me sales by gender and marital status. Male Female $0k $5k $10k $15k

4% higher productivity 6% higher profits Source: https://hbr.org/2012/10/big-data-the-management-revolution

Empowering people to make better decisions is not the end goal it s just the beginning

Internet of Things Data Analytics Machine Learning Big Data

The Internet (The Past)

Cost Speed Bandwidth

The internet was expensive, slow, and not generating much data.

Internet of Things (The Present)

Cost Speed Bandwidth

Billions of Devices Growth of the IoT Devices 50 40 30 20 10 0 1990 1995 2000 2005 2010 2015 2020 Year Source: NCTA, 2014

50 billion IoT devices by 2020

The internet of things is cheap, fast, and generating tons of data.

Internet of Everything (The Future)

Cost Speed Bandwidth

An internet connection will likely be as common to devices as electricity.

We re building a peripheral nervous system for our planet... but it needs a brain.

FUN GAME 1 Is It IoT?

Is it IoT?

YES!

Is it IoT?

YES!

Is it IoT?

YES!

Is it IoT?

NO : (

Internet of Things Data Analytics Machine Learning Big Data

Data (The Past)

Data sets were small, slow, and had little diversity.

Big Data (The Present)

Data in Zettabytes (ZB) Global Data Growth 40 35 30 25 20 15 10 5 0 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 Year Source: UNECE Statistics Wikis

Doubling every two years

Volume Velocity Big Data Variety

Volume Velocity Big Data Variety

Volume

Velocity

Variety

INTEGRATION

Gender: Female Age: 31 Emotion: Happy Gender: Male Age: 5 Emotion: Happy Apple

Today s data sets are bigger, faster, and more diverse.

Just Data Again (The Future)

Cost

We re creating new tools to automate feature extraction... but it requires machine learning.

FUN GAME 2 Am I Smarter Than Big Data?

What Do These Have In Common?

What Do These Three Things Predict?

Internet of Things Data Analytics Machine Learning Big Data

Artificial Intelligence (The Past)

Source: Evan Amos

AI Winter has ended and things are warming up again.

Machine Learning (The Present)

Artificial Intelligence Machine Learning Statistics

f x

f x Data Function Prediction

f x Data Function Prediction Cat Dog

f x Data Function Prediction Cat Dog Is cat?

f x Data Function Prediction Cat Dog Is cat? Yes

The next generation of ML will be able to complete even more complex tasks.

Deep Learning (The Future)

Deep Learning Human Cat Dog Car

Deep Neural Network input hidden 1 hidden 2 hidden 3 output

Deep Neural Network input hidden 1 hidden 2 hidden 3 output

Deep Neural Network input hidden 1 hidden 2 hidden 3 output

Deep Neural Network input hidden 1 hidden 2 hidden 3 output

Deep Neural Network John Jane Miko Lee input hidden 1 hidden 2 hidden 3 output

f x

AI Winter 2.0?

AI AI Winter 2.0? 2.0? or Human Winter 1.0?

FUN GAME 3 Dog or Mop?

MOP!

DOG!

DOG!

MOP!

DOG! MOP!

Closing the Loop (The Future of Data Science)

Data Analytics

Internet of Things Data Analytics

Internet of Things Data Analytics Big Data

Internet of Things Data Analytics Machine Learning Big Data

Internet of Things Data Analytics Machine Learning Big Data

Internet of Things Data Analytics Machine Learning Big Data

Internet of Things Data Analytics Machine Learning Big Data

Fully Autonomous Systems

Smart systems Cloud robotics Cyber-physical systems

Embedded intelligence will be woven into the fabric of our society.

The Big Data Universe Amount of data as of 2016 in petabytes Human brain 2.5 PB Ebay 90 PB Google 15,000 PB (estimated) Spotify 10 PB Facebook 300 PB Source: The Royal Society, 2016

Complexity High Automation Framework Routine & Complex Non-routine & Complex Routine & Simple Non-routine & Simple Low High Low Repetitiveness Source: Abhas Gupta -The Automation Framework

Complexity High Automation Technology Deep Learning High-level Programming Conventional Machine Learning Low High Low Repetitiveness Source: Abhas Gupta -The Automation Framework

Complexity High Retail Salesperson Fold Clothes Greet Customers Convert Customers Sizing Inventory Count Inventory Pull Cash Register Low High Low Repetitiveness Source: Abhas Gupta -The Automation Framework

Complexity High Medical Doctor Treatment Plan Diagnose Disease Rare Disease Build Trust Routine Check-up Input EMR Write Prescription Low High Low Repetitiveness Source: Abhas Gupta -The Automation Framework

Source: CGP Grey

Industrial Robots (per 1000 US workers) Rise of the Robots 2 1.5 1 0.5 0 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Year Source: International Federation of Robotics

Data science will amplify this trend

Which side of this new economy will your job be on? The side that s leading or the side being eliminated.

What will you choose?

Welcome Robot Overlords!!!

Where to We Go Next?

Where to Go Next Pluralsight: https://www.pluralsight.com Coursera: https://www.coursera.org Data Camp: https://www.datacamp.com

Recommended Courses Data Science: The Big Picture Data Science with R Exploratory Data Analysis with R Data Visualization with R (3-part) https://www.pluralsight.com/authors/matthew-renze

www.matthewrenze.com

Feedback Very important to me! What did you like? What could I improve?

Conclusion

Internet of Things Data Analytics Machine Learning Big Data

Are you prepared? Is your organization? Is our world prepared?

Thank You! Matthew Renze Data Science Consultant Renze Consulting Twitter: @matthewrenze Email: info@matthewrenze.com Website: www.matthewrenze.com