DevSci: Better Software Through Data #KCDC2018

Similar documents
Data Science: The Big #SQLServerUserGroupDubai

SQLStarter Intro to Data Science. Dave

Big data is hard. Top 3 Challenges To Adopting Big Data

The Hunt for the Data Scientist GIEWEE HAMMOND MSCAN, MSCAS LEAD DATA SCIENTIST, ARAMCO SERVICES COMPANY

The Importance of good data management and Power BI

Digital Transformation 2.0

Sunnie Chung. Cleveland State University

Azure ML Data Camp. Ivan Kosyakov MTC Architect, Ph.D. Microsoft Technology Centers Microsoft Technology Centers. Experience the Microsoft Cloud

Databricks Cloud. A Primer

INTRODUCTION TO R FOR DATA SCIENCE WITH R FOR DATA SCIENCE DATA SCIENCE ESSENTIALS INTRODUCTION TO PYTHON FOR DATA SCIENCE. Azure Machine Learning

KnowledgeENTERPRISE FAST TRACK YOUR ACCESS TO BIG DATA WITH ANGOSS ADVANCED ANALYTICS ON SPARK. Advanced Analytics on Spark BROCHURE

Big Data Introduction

Applications and Big Data Converge

Building Your Big Data Team

Data IBM. Education for our Data Scientists. Emily Plachy, Distinguished Engineer, IBM Global Chief Data Office May 1, 2017

Big Data Job Descriptions. Software Engineer - Algorithms


Architecting an Open Data Lake for the Enterprise

Simplifying the Process of Uploading and Extracting Data from Apache Hadoop

Deloitte School of Analytics. Demystifying Data Science: Leveraging this phenomenon to drive your organisation forward

Application Integrator Automate Any Application

Microsoft Azure Essentials

Agile Software Requirements. Matthew Renze Iowa State University COMS 409 Software Requirements

Modern Analytics Architecture

Transforming Analytics with Cloudera Data Science WorkBench

Jason Virtue Business Intelligence Technical Professional

How to Build Your Data Ecosystem with Tableau on AWS

Business is being transformed by three trends

Azure Data Analytics & Machine Learning Seminar. Daire Cunningham: BI Practice Area Manager

Designing Business Intelligence Solutions with Microsoft SQL Server 2014

How Data Science is Changing the Way Companies Do Business Colin White

Designing Business Intelligence Solutions with Microsoft SQL Server 2014 Course Code: 20467D

Bringing the Power of SAS to Hadoop Title

30 Minutes Overview of Data Science for Business

Course Content. The main purpose of the course is to give students the ability plan and implement big data workflows on HDInsight.

Advanced Analytics in Azure

20775A: Performing Data Engineering on Microsoft HD Insight

BIG DATA PROCESSING A DEEP DIVE IN HADOOP/SPARK & AZURE SQL DW

Evolution or Revolution: Top Ten Development Trends

20775 Performing Data Engineering on Microsoft HD Insight

Roles and Processes in Analytics Development

Garanti Bank s Journey to Big Data Ayşen Büyükakın Business Intelligence & Analytics Unit Manager

Exploratory Data Analysis with #PrDC16

20775: Performing Data Engineering on Microsoft HD Insight

Statistics & Optimization with Big Data

Introduction to Agile and Scrum

Data Visualization with #KCDC

BIG DATA and DATA SCIENCE

Joining the disruption in the Asset Management Industry How to evaluate new technologies and implement new ideas like a start up company

Cloudera Data Science and Machine Learning. Robin Harrison, Account Executive David Kemp, Systems Engineer. Cloudera, Inc. All rights reserved.

20775A: Performing Data Engineering on Microsoft HD Insight

Career Center. Resources for Exploring Careers. in Data Science. Explore the Variety of Career Paths with These Example Fields & Roles

Mass-Scale, Automated Machine Learning and Model Deployment Using SAS Factory Miner and SAS Decision Manager

ETL challenges on IOT projects. Pedro Martins Head of Implementation

Venkata Reddy Konasani

Guide to Modernize Your Enterprise Data Warehouse How to Migrate to a Hadoop-based Big Data Lake

8 Steps CIOs Must Take To Transform With Artificial Intelligence

EMBED ANALYTICS EVERYWHERE Tomáš Jurczyk

Confidential

Data Analytics for Semiconductor Manufacturing The MathWorks, Inc. 1

Analytics in the Cloud, Cross Functional Teams, and Apache Hadoop is not a Thing Ryan Packer, Bank of New Zealand

Enabling Self-Service Analytics Across The UDA With Teradata AppCenter

The Alpine Data Platform

E-guide Hadoop Big Data Platforms Buyer s Guide part 1

BIG DATA & ADVANCED ANALYTICS ROADSHOW

Brian Macdonald Big Data & Analytics Specialist - Oracle

H2O Powers Intelligent Product Recommendation Engine at Transamerica. Case Study

Pentaho Technical Overview. Max Felber Solution Engineer September 22, 2016

Big Data Application Engineer/ Developer. Specialization in Apache Spark, Kafka, Airflow, HBase

: What are examples of data science jobs?

Make Business Intelligence Work on Big Data

Cask Data Application Platform (CDAP) Extensions

Analytics for All Your Data: Cloud Essentials. Pervasive Insight in the World of Cloud

Course 20467C: Designing Self-Service Business Intelligence and Big Data Solutions

Introduction to Big Data(Hadoop) Eco-System The Modern Data Platform for Innovation and Business Transformation

Microsoft Developer Day

KnowledgeSTUDIO. Advanced Modeling for Better Decisions. Data Preparation, Data Profiling and Exploration

Your Big Data to Big Data tools using the family of PI Integrators

Getting the Most Out of PureConnect Analytics and Reporting

GET MORE VALUE OUT OF BIG DATA

Exelon Utilities Data Analytics Journey

1% + 99% = AI Popularization

Analytics in Action transforming the way we use and consume information

ALEXANDER PIQUER ANDERSON AMARAL. OSIsoft. OSIsoft

Hadoop Course Content

Market for BI & Data Analytics

Cask Data Application Platform (CDAP)

Who is Databricks? Today, hundreds of organizations around the world use Databricks to build and power their production Spark applications.

Staffing Services Portfolio Advisory Fulfilment

Architecture Overview for Data Analytics Deployments

The Fast (Developer) and the Furious (Ops Team)

Organon Advisors, Inc.

HDInsight - Hadoop for the Commoner Matt Stenzel Data Platform Technical Specialist

DATA ANALYTICS WITH R, EXCEL & TABLEAU

This document (including, without limitation, any product roadmap or statement of direction data) illustrates the planned testing, release and

BIG Data Analytics AWS Training

Welcome! 2013 SAP AG or an SAP affiliate company. All rights reserved.

Real Applications of Big Data. Challenges and Opportunities

Two offerings which interoperate really well

Advancing your Big Data Strategy

Transcription:

DevSci: Better Software Through Data Science @MatthewRenze #KCDC2018

What is data science? Why is it important? How do I get started?

Job Postings for Data Scientists

Top-paying Tech Skills Skill 2016 Change Skill 2016 Change Source: Dice Salary Survey 2017

About Me Data Science Consultant Education B.S. in Computer Science (ISU) B.A. in Philosophy (ISU) Community Keynote speaker Pluralsight author DataCamp author Microsoft MVP AI ASPInsider

About Me Data Science Consultant Education B.S. in Computer Science (ISU) B.A. in Philosophy (ISU) Community Keynote speaker Pluralsight author DataCamp author Microsoft MVP AI ASPInsider

About Me Data Science Consultant Education B.S. in Computer Science (ISU) B.A. in Philosophy (ISU) Community Keynote speaker Pluralsight author DataCamp author Microsoft MVP AI ASPInsider

What is data science?

Computer Science Data Science Math and Statistics Domain Knowledge

Data Knowledge Decision Action

What Is a Data Scientist? Performs data science More than a scientist More than an analyst More than a developer

What skills are necessary?

Data Science Skills Programming Working with data Descriptive statistics Data visualization

Data Science Skills Programming Working with data Descriptive statistics Data visualization Statistical modeling Handling Big Data Machine learning Deploying to production

What tools are used?

SQL Excel Python R MySQL Python tools ggplot SQL Server Tableau JavaScript Matplotlib Java PostgreSQL Oracle D3 Homegrown Hive Spark Cloudera Visual Basic MongoDB Hadoop SAS C++ PowerPivot Scala SQLite C Pig RedShift Weka Hbase (EMR) Perl SPSS Teradata Share of Respondents 70% 60% Data Science Tools 50% 40% 30% 20% 10% 0% Tool: language, platform, analytics Source: O Reilly 2015 Data Science Salary Survey

SQL Excel Python R MySQL Python tools ggplot SQL Server Tableau JavaScript Matplotlib Java PostgreSQL Oracle D3 Homegrown Hive Spark Cloudera Visual Basic MongoDB Hadoop SAS C++ PowerPivot Scala SQLite C Pig RedShift Weka Hbase (EMR) Perl SPSS Teradata Share of Respondents 70% 60% Data Science Tools 50% 40% 30% 20% 10% 0% Tool: language, platform, analytics Source: O Reilly 2015 Data Science Salary Survey

SQL Excel Python R MySQL Python tools ggplot SQL Server Tableau JavaScript Matplotlib Java PostgreSQL Oracle D3 Homegrown Hive Spark Cloudera Visual Basic MongoDB Hadoop SAS C++ PowerPivot Scala SQLite C Pig RedShift Weka Hbase (EMR) Perl SPSS Teradata Share of Respondents 70% 60% Data Science Tools 50% 40% 30% 20% 10% 0% Tool: language, platform, analytics Source: O Reilly 2015 Data Science Salary Survey

SQL Excel Python R MySQL Python tools ggplot SQL Server Tableau JavaScript Matplotlib Java PostgreSQL Oracle D3 Homegrown Hive Spark Cloudera Visual Basic MongoDB Hadoop SAS C++ PowerPivot Scala SQLite C Pig RedShift Weka Hbase (EMR) Perl SPSS Teradata Share of Respondents 70% 60% Data Science Tools 50% 40% 30% 20% 10% 0% Tool: language, platform, analytics Source: O Reilly 2015 Data Science Salary Survey

SQL Excel Python R MySQL Python tools ggplot SQL Server Tableau JavaScript Matplotlib Java PostgreSQL Oracle D3 Homegrown Hive Spark Cloudera Visual Basic MongoDB Hadoop SAS C++ PowerPivot Scala SQLite C Pig RedShift Weka Hbase (EMR) Perl SPSS Teradata Share of Respondents 70% 60% Data Science Tools 50% 40% 30% 20% 10% 0% Tool: language, platform, analytics Source: O Reilly 2015 Data Science Salary Survey

How is data science performed?

The Data Science Process Data

The Data Science Process Find a question Data

The Data Science Process Find a question Collect the data Data

The Data Science Process Find a question Collect the data Data Prepare the data

The Data Science Process Find a question Collect the data Data Prepare the data Create a model

The Data Science Process Find a question Collect the data Evaluate the model Data Prepare the data Create a model

The Data Science Process Find a question Deploy the model Collect the data Evaluate the model Data Prepare the data Create a model

The Data Science Process Find a question Deploy the model Collect the data Evaluate the model Data Prepare the data Create a model

The Data Science Process Find a question Iterative process Deploy the model Explore the data Evaluate the model Data Prepare the data Create a model

The Data Science Process Find a question Iterative process Non-sequential Deploy the model Explore the data Evaluate the model Data Prepare the data Create a model

The Data Science Process Find a question Iterative process Deploy the model Explore the data Non-sequential Early termination Evaluate the model Data Prepare the data Create a model

Why is data science important?

Two Main Approaches Build intelligent software Improve development practices

Two Main Approaches Build intelligent software

Internet Sales Show me sales by gender and marital status. Displaying sum of sales by gender and marital status Marital Status: Married Single Show me sales by gender and marital status. Male Female $0k $5k $10k $15k

Machine Learning Human Cat Dog Car

Anticipatory Design Collect Data Create Algorithm Anticipate Choices

Two Main Approaches Improve development practices

Data-Driven Decision Making Build Learn Measure

Hypothesis-Driven Development Hypothesis Analysis Experiment

Hypothesis-Driven Development Hypothesis Hypothesis: Users will prefer feature A over feature B Analysis Experiment

Hypothesis-Driven Development Hypothesis Hypothesis: Users will prefer feature A over feature B Analysis Experiment Experiment: Survey 100 users and ask for their preference

Hypothesis-Driven Development Hypothesis Hypothesis: Users will prefer feature A over feature B Analysis: 80% of users prefer feature A Analysis Experiment Experiment: Survey 100 users and ask for their preference

Hypothesis-Driven Development Hypothesis Hypothesis: Pair programming will increase our long-term velocity Analysis Experiment

Hypothesis-Driven Development Hypothesis Hypothesis: Pair programming will increase our long-term velocity Analysis Experiment Experiment: Pair for 4 sprints and track velocity

Hypothesis-Driven Development Hypothesis Hypothesis: Pair programming will increase our long-term velocity Analysis: Velocity increased by 20% per sprint Analysis Experiment Experiment: Pair for 4 sprints and track velocity

Hypothesis Stories <Hypothesis> We assume that <hypothesis> Will result in<outcome> We will have succeeded when <measurable result>

Hypothesis Stories Pair Programming Hypothesis We assume that pair programming Will result in higher long-term velocity We will have succeeded when we have seen a 10% or greater increase in velocity after 4 sprints.

A/B Testing

A/B Testing

Feature Toggles New Feature Feature Toggles User Groups

Feature Toggles New Feature Feature Toggles User Groups

DevOps Pipeline Code Source Control Build Q/A Deploy Prod

DevOps Pipeline Code Source Control Build Q/A Deploy Prod

Code Quality Metrics Source: NDepend

Source Control Metrics

Build Metrics Source: Visual Studio Team Services

Q/A Metrics

Deployment Metrics Source: Octopus Deploy

Software Telemetry

DevOps Pipeline Code Source Control Build Q/A Deploy Prod

How do I get started?

What are the ingredients of a data-driven enterprise?

Strategy Culture People Technology Data

Strategy

People

Data

Technology

Culture

What is the process of becoming a data-driven enterprise?

AI Predict Analyze Organize Measure

1. Measure Transactions Instrumentation Logging Surveys Digitization External data Measure

2. Organize Transform Clean Store Data ETL Data Warehouse Data Lake Organize Measure

3. Analyze Reports Dashboards KPI monitors Decision support Descriptive analytics Diagnostic analytics Analyze Organize Measure

4. Predict Predict Predictive analytics Prescriptive analytics Machine learning Hypothesis testing Experimentation Analyze Organize Measure

5. Automate AI Predict Artificial intelligence Expert systems Deep learning Analyze Organize Measure

AI Predict Analyze Organize Measure

Advice for Success Get buy-in from leadership Focus on low-hanging fruit Don t silo data science teams Democratize your data

Advice for Success Embrace smart failure Focus on feedback Embed data collection Avoid the Observer Effect

Where to Go Next?

Where to Go Next Data Camp: https://www.datacamp.com Pluralsight: https://www.pluralsight.com Coursera: https://www.coursera.org

Pluralsight Courses Data Science: The Big Picture Data Science with R Exploratory Data Analysis with R Data Visualization with R (3-part) Deep Learning: The Big Picture https://www.pluralsight.com/authors/matthew-renze

www.matthewrenze.com

Feedback Very important to me! What did you like? What could I improve?

Conclusion

What data science is Why it is important How to get started

Are you prepared? Is your organization? Is our world prepared?

Thank You! Matthew Renze Data Science Consultant Renze Consulting Twitter: @matthewrenze Email: info@matthewrenze.com Website: www.matthewrenze.com