Making your Data Ready for AI

Size: px

Start display at page:

Download "Making your Data Ready for AI"

Lester Kelley
5 years ago
Views:

1 Making your Data Ready for AI Daniel G. Hernandez VP, IBM Analytics / 2018 IBM Corporation 1

2 Each of us makes 35,000 decisions a day IBM Analytics / 2018 IBM Corporation 2

3 Some decisions are challenging IBM Analytics / 2018 IBM Corporation 3

4 Some decisions are regrettable IBM Cloud / 2018 IBM Corporation 4

5 Each day If Then If Then Awake for 16 hours a day Make 2,188 decisions every hour Work for 8 hours a day Make 17,500 decisions at work each day That s 36 decisions per minute or ~2 minutes per decision IBM Analytics / 2018 IBM Corporation 5

6 Most of the decisions we make are boring. IBM Analytics / 2018 IBM Corporation 6

7 What if we seek out and erase the boring? If a typical person can do a mental task with less than one second of thought, we can probably automate it using AI either now or in the near future. - Andrew Ng, Former Chief Scientist of Baidu (2016) IBM Analytics / 2018 IBM Corporation 7

8 IBM Analytics / 2018 IBM Corporation 8

9 IBM Cloud / 2018 IBM Corporation IBM Analytics / 2018 IBM Corporation 9

10 AI is anything a computer can do that feels like MAGIC today IBM Analytics / 2018 IBM Corporation 10

11 IBM Analytics / 2018 IBM Corporation 11

12 IBM Cloud / 2018 IBM Corporation 12

13 IBM Analytics / 2018 IBM Corporation 13

14 IBM Analytics / 2018 IBM Corporation 14

15 I want AI! In a MIT & BCG survey of more than 3,000 executives, managers, and analysts across industries 75% 85% believe AI will enable their companies to move into new business believe AI will allow their companies to obtain or sustain a competitive advantage Reshaping Business With Artificial Intelligence, Sam Ransbotham, David Kiron, Phillip Gerbert, and Martin Reeves, MIT Sloan Management Review, Fall 2017 IBM Analytics / 2018 IBM Corporation 15

16 In the same global survey of more than 3,000 executives, managers, and analysts across industries 1/5 Has incorporated AI in some offerings or products 1/20 Has extensively incorporated AI in offerings or process 39% Of all companies have an AI strategy in place (50% when only counting companies with at least 100,00 employees) IBM Analytics / 2018 IBM Corporation 16

17 There is no AI without IA IBM Analytics / 2018 IBM Corporation 17

18 The capabilities delivered by IBM Analytics build a ladder to AI AI Embed machine learning everywhere Analyze insights on demand Organize data so it can be trusted IBM Cloud / 2018 IBM Corporation Collect relevant data and make it simple & accessible 18

19 Collect > Organize > Analyze 80% of your data is NOT searchable IBM Analytics / 2018 IBM Corporation 19

20 Collect > Organize > Analyze And our data universe is expanding Today By 2022 Only 33% of all enterprise workloads run on multiple clouds 25% of all enterprise workloads are installed on traditional IT 67% of all enterprise workloads are installed on traditional IT 75% of all enterprise workloads run on multiple clouds IBM Analytics / 2018 IBM Corporation 20

21 Collect > Organize > Analyze To collect Data, we need a comprehensive data management strategy Our hybrid data management design principles Multi-Cloud Availability Private & Public Cloud Load & Go Enable data-driven decision for everyone All Data Structured and unstructured Transactional & Analytical At Rest & In-Motion From Event-driven to traditional systems Maximize value Optimize cost Cloud Native Elastic and resource optimization Built-in AI Smarter decisions and optimized operations IBM Analytics / 2018 IBM Corporation 21

22 The capabilities delivered by IBM Analytics build a ladder to AI AI Embed machine learning everywhere Analyze insights on demand Organize data so it can be trusted IBM Cloud / 2018 IBM Corporation Collect relevant data and make it simple & accessible 22

23 Unfortunately, we believe most of what we see IBM Analytics / 2018 IBM Corporation 23

24 Collect > Organize > Analyze Meanwhile, finding relevant data for training and inference remains a challenge. No Data = No AI Data science team cannot be productive without high quality, relevant data Most enterprises have data scattered across legacy & digitally native apps The Situation AI is trained on data to automate tasks & decision making. Garbage In. Garbage Out IBM Analytics / 2018 IBM Corporation

25 Collect > Organize > Analyze To organize data, we need a data catalog strategy that allows us to trust our data Our unified governance and integration design principles Multi-Cloud Availability One scalable catalog available everywhere All Data Works with all data types no matter the format & location Catalog Micro-services Catalog embedded in IBM & partner offerings Catalog Strategy Open Metadata Ensures the IBM platform is open and can easily integrated with Persona Specific Experiences Designed for data scientists, data engineers, stewards and analysts Built-in AI Exploits Machine Learning for intelligence and automation IBM Cloud / 2018 IBM Corporation 25

26 The capabilities delivered by IBM Analytics build a ladder to AI AI Embed machine learning everywhere Analyze insights on demand Organize data so it can be trusted IBM Cloud / 2018 IBM Corporation Collect relevant data and make it simple & accessible 26

27 Collect > Organize > Analyze IBM Analytics / 2018 IBM Corporation 27

Collect > Organize > Analyze Models are the foundation of any

You need the skills & capability to build & deploy models

and explain things 1 2 3 DATA Build Run Manage AI Prepare

Evaluate Monitor for Performance Manage System Lifecycle USERS

28 Collect > Organize > Analyze Models are the foundation of any AI solution. You need the skills & capability to build & deploy models MODELS automate tasks, make predictions, generate insights, and explain things DATA Build Run Manage AI Prepare Deploy Orchestrate Train Test Monitor for Trust & Bias Evaluate Monitor for Performance Manage System Lifecycle USERS IBM Analytics / 2018 IBM Corporation Data Scientist AI Developer Data Scientist AI Developer AI Ops Data Scientist AI Developer AI Ops

29 Collect > Organize > Analyze When talent leaves, the IP leaves with them By 2020, the number of jobs for all US data professionals will increase by 364,000 openings to 2,720,000 IBM Analytics / 2018 IBM Corporation 29

30 Collect > Organize > Analyze To analyze data, we need an complete and end to end stack for data science & AI Our data science and machine learning design principles Multi-Cloud Availability Build Anywhere, Deploy Anywhere All Data Bring the power of machine learning to all types of data Collaborative One system designed to help teams build and deploy data science and AI apps Data Science & AI for All Open Source Integration Use languages & tools that the community uses Monitor and Measure Measure the performance and results of your models Seamlessly deploy models Use an extensive library of algorithms and frameworks in one runtime IBM Analytics / 2018 IBM Corporation 30

31 The capabilities delivered by IBM Analytics build a ladder to AI AI Embed machine learning everywhere Analyze insights on demand Organize data so it can be trusted IBM Cloud / 2018 IBM Corporation Collect relevant data and make it simple & accessible 31

Watson Studio One Development Environment for Data Science & AI Learn Build Collaborate Content that educates & helps users get

& Data Tutorials that showcase the latest techniques and features Social engagement through bookmarking and sharing OSS support of

Integrations: Watson Developer APIs, Data Mirror, and other ISVs Projects assemble & organize work Version Control and access

32 Watson Studio One Development Environment for Data Science & AI Learn Build Collaborate Content that educates & helps users get started One Experience for building, training & evaluating models Designed to help teams get work done Samples of Notebooks, Models, & Data Tutorials that showcase the latest techniques and features Social engagement through bookmarking and sharing OSS support of Python, R, Scala, & top DS + AI frameworks like Tensorflow Visual & Code Editors for Jupyter, RStudio, Zeppellin, & SPSS Integrations: Watson Developer APIs, Data Mirror, and other ISVs Projects assemble & organize work Version Control and access controls enable distributed teams & protects IP Comments track the team activity, explain assets, and improve reuse IBM Analytics / 2018 IBM Corporation 32 3

Watson Machine Learning One Runtime for Data Science & AI Open Deploy Run One runtime for Open Source and IBM algorithms Publish models into production & expose as APIs Monitor & continuously improve

33 Watson Machine Learning One Runtime for Data Science & AI Open Deploy Run One runtime for Open Source and IBM algorithms Publish models into production & expose as APIs Monitor & continuously improve models post deployment Common experience for OSS & IBM frameworks & algorithms Supports SPSS, Tensorflow, Keras, scikit-learn, xgboost, and more Track training job progress IBM Analytics / 2018 IBM Corporation Multiple Deployment options including private, aas, Hadoop, and Z Run models built using Watson Studio, SPSS and third party tools Scale to millions of predictions in seconds Automate the model retraining with scheduling & triggers Automatic algorithm selection via CADS technology from IBM Research Monitor model health, usage, and lineage through dashboards 333

34 IBM Cloud Private for Data Delivering the Ladder to AI in one platform Cloud Native Lightning Fast AI-ready Simplifies & Unifies Collect Data Organize Data Analyze Data Collect data of every type, no matter where it lives, and achieve freedom from ever-changing data sources Organize your data into a trusted, business-aligned source of truth to put data to work in new ways. Analyze your data in smarter ways, empowering all your teams with Machine Learning Everywhere IBM Analytics / 2018 IBM Corporation

35 Check it out at IBM Cloud / 2018 IBM Corporation 36

36 Bonus Material: Case Studies IBM Analytics / 2018 IBM Corporation 37

Predicting the likelihood of sepsis in patients Problem Sepsis is a potentially life-threatening complication of an infection for which early detection and treatment can be the difference between

37 Predicting the likelihood of sepsis in patients Problem Sepsis is a potentially life-threatening complication of an infection for which early detection and treatment can be the difference between life and death. Outcome IBM s Data Science & AI Elite team built a Machine Learning model that was proven to predict the probability of sepsis for any given patient. This was used to guide doctors prevention and treatment plans. Key solutions Machine Learning in IBM s Data Science portfolio was used to build predictive model of all-causes of death in Sepsis patients while admitted at the hospital or through 90 days after discharge, and to look for actionable predictors that can help influence and improve patients outcome. Used clinical data from 10,000 patients. Data De-identified clinical data over a 10 year period of 10,000 patients diagnosed with sepsis: features included demographics, lab results, disease history, healthcare visit history, vitals, and medications. In 2015 Geisinger Health System implemented an IT system called a Unified Data Architecture (UDA) to bring in data from different information streams into one place for an aggregate view of all sepsis-patient information

Increasing operational efficiency with improved corporate hierarchies Problem Experian Business Information System (BIS) maintains a database of over 5000 corporate hierarchies based on the legal

38 Increasing operational efficiency with improved corporate hierarchies Problem Experian Business Information System (BIS) maintains a database of over 5000 corporate hierarchies based on the legal affiliation between corporate entities. This database is used to support clients mitigate risk and improve profitability. Categorization of corporate hierarchies is done through a combination of distributional entity matching and heuristic rules, and manual processing but the current approach has shortcomings in that incoming data can have a lot of variation in how locations and company names are reported impacting the accuracy of the entity matching. Outcome Improved the accuracy of entity matching with recurrent neural networks and increased the system accuracy in finding new matches, thereby reducing manual processing. Retrained some of staff to become data scientists in the process. Key solutions IBM Watson Studio Watson Machine Learning IBM Cloud Object Storage Data Base file containing information on business operating locations (3.6M) and 23 fields associated with each one Location variation ID file for which each bin could be connected to several hundreds of LVID s 39 39