TOP 5 LESSONS LEARNED IN DEPLOYING AI IN THE REAL WORLD. Joshua Robinson, Founding Engineer

Size: px

Start display at page:

Download "TOP 5 LESSONS LEARNED IN DEPLOYING AI IN THE REAL WORLD. Joshua Robinson, Founding Engineer"

Rafe Daniels
5 years ago
Views:

1 TOP 5 LESSONS LEARNED IN DEPLOYING AI IN THE REAL WORLD Joshua Robinson, Founding Engineer

AI REVOLUTIONIZING EVERY INDUSTRY Automotive- Volvo Volvo and

cars by 2021 Farming- Blue River 10% of lettuce in the US is

minimize chemicals Kitchen- LG LG Instaview refrigerator with

avoid surgery for tumor samples & recommend treatments

2 AI REVOLUTIONIZING EVERY INDUSTRY Automotive- Volvo Volvo and its subsidiary, Zenuity, are racing to build fully autonomous cars by 2021 Farming- Blue River 10% of lettuce in the US is harvested by LettuceBot, using AI to maximize crop yield & minimize chemicals Kitchen- LG LG Instaview refrigerator with AI-powered Alexa helps owners shop for groceries with their voice Marketing- SAP SAP s Brand Impact tool measures ROI of brand sponsorships using video analytics and AI PURE STORAGE INC. Healthcare- Mayo Clinic Finds genetic markers in images to avoid surgery for tumor samples & recommend treatments Consumer Goods- P&G Proctor & Gamble s Olay is using AI to inspect skin & improve trouble spots, all with a mobile phone

3 SOFTWARE 2.0 AI REPRESENTS FUNDAMENTAL SHIFT IN HOW SOFTWARE IS WRITTEN 3 Software 2.0: Comic:

4 QUESTION ON EVERYONE S MIND: WHY IS A STORAGE COMPANY HERE? 4

THE BIG BANG OF INTELLIGENCE FUELED BY PARALLEL

Parallel Architecture Driving Performance BIG DATA

5 THE BIG BANG OF INTELLIGENCE FUELED BY PARALLEL COMPUTE, NEW ALGORITHMS, AND BIG DATA NEW ALGORITHMS Massively Parallel Delivering Superhuman Accuracy MODERN COMPUTE Massively Parallel Architecture Driving Performance BIG DATA Data is the New Oil 50 Zettabytes Created in 2020 GPU- THOUSANDS OF CORES 5

6 TECHNOLOGIES OF THE BIG BANG WHAT CUSTOMERS DEPLOY FRAMEWORKS GPU SERVER STORAGE 6

7 DATA IS VITAL TO MACHINE LEARNING OBSERVATION BY PROF. ANDREW NG, AI LUMINARY 7

8 TOP 5 LESSONS LEARNED 1. AI is a Data Pipeline 8

9 WHAT MOST THINK IS AI NEW POSSIBILITIES For Nearly Every Industry FRAMEWORKS To Get Started GPU The Engine 9

10 AI IS SO MUCH MORE Hidden Technical Debt in Machine Learning Systems, Google NIPS

11 COMPLEXITIES OF AI IN PRODUCTION INGEST CLEAN & TRANSFORM EXPLORE TRAIN From sensors, machines, & user generated Label, anomaly detection, ETL, prep, stage Quickly iterate to converge on models Run for hours to days in production cluster CPU Servers GPU Server GPU Production Cluster COPY & TRANSFORM COPY & TRANSFORM COPY & TRANSFORM 11

12 WIDE RANGE OF NEEDS IN AI PIPELINE SIGNIFICANT CHALLENGE TO LEGACY STORAGE INGEST CLEAN & TRANSFORM EXPLORE TRAIN From sensors & machines CPU Servers GPU Server GPU Production Cluster Access Pattern sequential sequential or random random random Access Type write read & write read read File Size mostly large small to large small to large mostly small Concurrency high high low high 12

13 TOP 5 LESSONS LEARNED 1. AI is a Data Pipeline 2. Don t Throw Your Data into Data Lake 13

DATA LAKE OR DATA GRAVEYARD? We see customers creating big data graveyards, dumping everything into HDFS [Hadoop Distributed File System] and hoping to do something with it down the road.

14 DATA LAKE OR DATA GRAVEYARD? We see customers creating big data graveyards, dumping everything into HDFS [Hadoop Distributed File System] and hoping to do something with it down the road. But then they just lose track of what s there. The main challenge is not creating a data lake, but taking advantage of the opportunities it presents. 14 PricewaterhouseCoopers Technology Forecast, Issue 1, 2014

DISKS Typical File is Large Access is Sequential Hardware Failure is the Norm Data is Batched

15 THE OLD WORLD OF DATA ANALYTICS 15 YEARS AGO Google File System was introduced, inspiring creation of Hadoop & HDFS ASSUMPTIONS ABOUT DATA IN GFS & HDFS DATA PLATFORM IS DISTRIBUTED DISKS Typical File is Large Access is Sequential Hardware Failure is the Norm Data is Batched Network is Slow. Lots of Disk in Nodes 3x Data Replication Batched Workflow Fixed Compute to Storage 15

16 MODERN DATA CHANGES EVERYTHING DATA IS NOW DIFFERENT Small to Large Files Random to Sequential Access Real-time or Batched Apps & Data Evolve Quickly Elastic Infrastructure DECADE AGO 16 TODAY

17 MODERN ANALYTICS WITH OLD DATA LAKE SPRAWLING, COMPLEX SILOS & SLOW PERFORMANCE HDFS DATA LAKE STATIC DATA LAKE NO LONGER VIABLE Each App Locked into Physical Silos SILO SILO Redundant Data Copies in Silos Fixed Compute to Storage in Silo Built for Large, Sequential Data Optimized for Batch, Not Real-Time 17 SILO SILO SILO

18 TOP 5 LESSONS LEARNED 1. AI is a Data Pipeline 2. Don t Throw Your Data into Data Lake 3. Cloud or Not to Cloud? 18

19 IT DEPENDS WHERE YOU ARE ON YOUR AI JOURNEY EXPLORATION PRODUCTION NEED Start Immediately Get New Products & Features to Market Faster than Competition 19

20 IT DEPENDS WHERE YOU ARE ON YOUR AI JOURNEY EXPLORATION PRODUCTION NEED Start Immediately Get New Products & Features to Market Faster than Competition DON T NEED Bogged Down with Infrastructure Bogged Down by Performance & Cost Inefficiencies 20

21 IT DEPENDS WHERE YOU ARE ON YOUR AI JOURNEY EXPLORATION PRODUCTION NEED Start Immediately Get New Products & Features to Market Faster than Competition DON T NEED Bogged Down with Infrastructure Bogged Down by Performance & Cost Inefficiencies RECOMMENDATION Cloud On-Premises 21

22 COST INEFFICIENCIES OF CLOUD Comparing NVIDIA DGX-1 (Volta) & Pure Storage FlashBlade vs AWS EC2 & S3 $ Cloud ~$800K in 3 Years > 60% Savings DGX-1 + FlashBlade ~$300K 22 1 Year 2 Years 3 Years Comparing DGX-1 Volta with FlashBlade system to AWS p3.16xlarge instance, AWS p3.16xlarge instance = $24.48/hour, AWS S3 cost, GET op = $0.004 per 10K requests, ignored other storage costs 8 Volta GPUs deliver 4,100 images/sec with Caffe2, Assume 100% utilization for 3 years TIME

23 TOP 5 LESSONS LEARNED 1. AI is a Data Pipeline 2. Don t Throw Your Data into Data Lake 3. Cloud or Not to Cloud? 4. Lies, Damn Lies, and Benchmarks 23

24 BENCHMARKS DO NOT REFLECT REALITY IMAGENET REAL-WORLD AUTONOMOUS CAR COMPANY IMAGE SIZE KB 2-5MB FILE SIZE 150MB (Packed TFRecords) 2-5MB MODE OF TESTING Synthetic (No I/O) Read from Storage 24

25 AI TRAINING SYSTEM GOAL IS TO KEEP THE GPUs 100% BUSY FULL TRAINING WORKFLOW decode scale evaluate forwardpropagation update back-propagation I/O CPU GPU GPU ONLY CPU + GPU I/O + CPU + GPU BENCHMARK SETUP Setup #1: Synthetic Data from System RAM into GPUs Setup #2: Real Image Data from System RAM Through CPU + GPU Setup #3: Real Image Data from FlashBlade into DGX-1 25

26 NEAR-LINEAR SCALE DELIVERED AIRI ENGINEERED FOR MAXIMUM PRODUCTIVITY AND OUT-OF-THE-BOX SCALE DEEP LEARNING TRAINING- MULTI-NODE USING GPUDIRECT RDMA OVER ETHERNET Comparing Synthetic Mode, Entire Data in DRAM, Entire Data in FlashBlade 26

27 TOP 5 LESSONS LEARNED 1. AI is a Data Pipeline 2. Don t Throw Your Data into Data Lake 3. Cloud or Not to Cloud? 4. Lies, Damn Lies, and Benchmarks 5. Ideal Data Platform is a Data Hub 27

28 IDEAL PLATFORM FOR MODERN ERA DYNAMIC DATA HUB ARCHITECTED FOR REAL-TIME & ELASTIC DATA DATA PIPELINE DATA HUB TUNED FOR EVERYTHING REAL-TIME ALL-FLASH PARALLEL ELASTIC SIMPLE Small, Random to Large, 2018 Seq. PURE STORAGE INC. Architected for the Unknown 28 Low Latency Performance for Instant Response Modern, Ultra-Fast Technology No Serial Bottlenecks for Max Throughput Grow Non-Disruptively with More App Clusters Focus More on Insights, Not Infrastructure

THE INDUSTRY S FIRST COMPLETE AI-READY INFRASTRUCTURE HARDWARE NVIDIA DGX-1 4x DGX-1 Systems 4 PFLOPS of DL Performance PURE FLASHBLADE 15x 17TB Blades 1.

29 THE INDUSTRY S FIRST COMPLETE AI-READY INFRASTRUCTURE HARDWARE NVIDIA DGX-1 4x DGX-1 Systems 4 PFLOPS of DL Performance PURE FLASHBLADE 15x 17TB Blades 1.5M IOPS ARISTA 2x 100Gb Ethernet Switches with RDMA SOFTWARE NVIDIA GPU CLOUD DEEP LEARNING STACK NVIDIA Optimized Frameworks AIRI SCALING TOOLKIT Multi-node Training Made Simple 29 PURE PROPRIETARY

30 AI & MODERN ANALYTICS POWERING ANALYTICS FOR WORLD S LARGEST PUBLIC HEDGE FUND AI CLEAN & LABEL AI EXPLORE AI TRAIN CPU Servers GPU Server GPU Servers SPARK CPU Servers MONGO CPU Servers Our quants want to test a model, get the results, and then test another one- all day long. So a 10-20X improvement in performance is a gamechanger when it comes to creating a time-to-market advantage for us. Gary Collier, co-cto, Man AHL PURE STORAGE INC.

31 TOP 5 LESSONS LEARNED 1. AI is a Data Pipeline 2. Don t Throw Your Data into Data Lake 3. Cloud or Not to Cloud? 4. Lies, Damn Lies, and Benchmarks 5. Ideal Data Platform is a Data Hub 31