Making Data Science Simple

Size: px
Start display at page:

Download "Making Data Science Simple"

Transcription

1 Making Data Science Simple IBM Code Tech Talk Oct 18 th

2 IBM Code Tech Talk Making Data Science Simple David Taieb Scott D Angelo October 18 th 2017

3 Agenda Pains and Promises of Data Science Watson Data Platform overview PixieDust Overview Developer Journey: Analyze San Francisco traffic data with Jupyter and PixieDust Sneak Preview of PixieGateway: Share you charts and publish your PixieApps on the Web QA

4 Making Data Science into a Team Sport Data steward Data scientist Data engineer Business Analyst Developer Find Share Collaborate Projects Data Assets Pipelines APIs Analyze Ingest Deploy Persist Intelligent governance Policy enforcement

5 What Do Businesses Care About Gain competitive advantage Better understanding of market, customers and competitors Innovation and Insights at speed Discover new insights and new sources of disruption Innovation leading to new business models Real-time (or close to it) decision making Reduce complexity and lower cost Accelerate time to market and deployment of data science and analytics

6 Pains Shortage of experts in key fields Inefficient use of Data Scientists 80% of the time spent finding, cleaning, organizing data Obstacles to Insights at speed 20% spent on data analysis Overly complex and inaccessible analytics tools Lack of data governance and cataloging Prototyping vs. productizing Lack of collaboration across business, developers and data science teams

7 It feels like this

8 when we want it to be like this

9 Where to Start? Tools Simple Standardized Integrated Services Abstract/Harness Complexity Scalable Open Source Data Access to curated contents Governance Community Break Silos Repeatable Productivity

10 IBM Watson and Cloud Platform Application Healthcare Financial Services Logistics DSX / WML IoT Virtual agent AI Conversation Visual Recognition Discovery Speech Compare + Comply Document Conversion Knowledge Query Nat. Language Understanding Tone Analysis Nat. Language Classifier Personality Insights + more... Data Ingest Public, Private, Licensed Enrich Store Analyze Apply The Watson Data Platform WML Cloud Dev Services Infrastructure Containers Storage Messaging Compute Blockchain Physical Network Logging Infrastructure Mgmt + more... + more...

11 Watson Data Platform an integrated platform of tools, services and data that helps companies accelerate their shift to become datadriven Builds data pipelines that power dashboards and data platforms while ensuring high quality Data Engineer Business Analyst Works with data to apply insights to the business strategy Drives governance policy effectiveness while tracking how data is used and its value to the company Data Steward Prepares data to tease out the insights they re looking for, without IT involvement Data Scientist App Developer Makes insights immediately actionable and adds intelligence to apps in straightforward manner

12 Data Science Experience Data Science as a Team Sport Projects - collaborate as team or work individually Jupyter Notebooks + IBM value add Machine Learning integrated in Projects: Use ML Wizard and Flows to train Models RStudio integrated with Spark DSX Integrates with Data in many places Built on the IBM Cloud platform Lets data scientists/engineers, analysts, stakeholders collaborate to collect, share, explore, analyze data in order to derive insights and train models, and share or deploy resulting assets - Integrated in Projects with access control - Spark integration with R/Python/Scala kernels - Versions, comments, share link, publish to GitHub - PixieDust, Brunel, - Object Storage (SWIFT now, new Cloud Object Storage soon) - Watson Data Platform Services and WDP Catalog - Message Hub and IBM Streaming Analytics - Can call any IBM service, e.g. Watson, Quantum, etc - Third party data services on premise or on other clouds 11 Try it yourself at

13 Projects allow users to work and collaborate Community (Internet Public) Users can add data and analytics assets from the Community to their Project Project Catalog (closed Users can add data and analytic assets from Catalog to Project or share assets to Catalog Data Members Admins Bluemix Account Connections Files Analytic Assets Data Sets Editors Readers beta) Notebooks Flows DB Users data scientists, data engineers, analysts, - can add data and analytics assets from the Community to their Project Kafka Topic Object Storage Bucket (per Project) Currently: SWIFT OS 4Q: Move to Cloud Object Storage with S3 APIs Catalog Metadata (closed beta) Models Dashboards Closed beta participants have early access to Catalog

14 DSX - Machine Learning in Projects, deploying to WML Create & Train ML in DSX Projects ML Flows ML Wizard ML Notebooks Deploy ML Models ML Flows Invoke Watson Machine Learning Service REST API to call Model Apps on Bluemix or others Data Scientists can train ML models in DSX using data in Projects to create and train models Use ML Wizard for assisted creation and training of models using common patterns and algorithms Use Notebooks or Flows to train models for more advanced use cases and more flexibility Deploy models to Watson Machine Learning (WML) service in Bluemix to run them in production Use WML REST API to invoke your models for online scoring / predictions

15 WDP Catalog Business Intelligence Data Science Integrated Tools Integration & Collaboration Data & Analytics Processing Protected Data Access Ingest and Persistance Analysis Model Building Tools Content Unified Console Visualization Dashboards Interactive Query Data Flow Orchestration Data Engineering Development Custom Apps 3rd Party Apps Add-ons Projects and Community Data Flows Data Provisioning Metadata Catalog Connectors and Tools Scale Productivity via Integration & Collaboration Scheduler API Streams Pipeline API Security / Access Data Shaping Auditing Deployment Lineage Advanced Analytics Governance Policy Data Quality On Prem & External DB2, Oracle, SAP Azure, Amazon, SalesForce, Twitter, Weather Streams, API s, Public Sites Genesis Cloud ObjectStore DashDB Cloudant Elastic Search Soflayer/Bluemix Cloud ObjectStore DashDB Cloudant Elastic Search Drive day to day business outcomes through Data Science Unlock Data with Confidence

16 Governed Catalog of assets Data Scientist Data Engineer Business Analyst CDO Office Intelligent data catalog providing trusted access to governed data assets. Guiding users to the best data for their purpose. Driving data reuse and discoverability Shared component available throughout the experiences of WDP.

17 PixieDust

18 DEMO Developer Journey: Analyze San Francisco traffic data with Jupyter and PixieDust

19 PixieGateway Share charts on the Web Deploy Notebooks and PixieApps and run them as regular web apps

20 PixieGateway: Architecture

21 DEMO PixieGateway

22 Learn More Watson Data Platform ibm.co/watsondataplatform Code Journey /analyze-san-francisco-traffic-datawith-ibm-pixiedust-and-data-scienceexperience IBM Watson Data Lab medium.com/ibm-watson-data-lab PixieDust ibm.co/pixiedust Data Catalog ibm.biz/data-catalog