How to develop Data Scientist Super Powers! Using Azure from R to scale and persist analytic workloads.. Simon Field

Size: px
Start display at page:

Download "How to develop Data Scientist Super Powers! Using Azure from R to scale and persist analytic workloads.. Simon Field"

Transcription

1 How to develop Data Scientist Super Powers! Using Azure from R to scale and persist analytic workloads.. Simon Field

2 Topics Why cloud Managing cloud resources from R Highly parallelised model training Persistent, scalable model deployment

3 Infinite compute power.. Exploratory Data Analysis Big Compute Embarrassingly Parallel Model Tuning, Validation, Optimisation, Simulation Dynamic / Fluctuating / Elastic / Temporary Workloads Batch / Scheduled workloads pay per use Persistent/Always On workloads Really FAST internet connection

4 R in Azure Tooling & Processing Big Processing Data Storage SQL Server (IAAS)

5 Do you ever wish you had a more powerful laptop? Azure Data Science Virtual Machines (DSVM) Custom VM image on Azure Marketplace Includes comprehensive set of data science, Azure tools/sdks All pre-configured and ready to use Pay for cloud hardware usage only. No software charges! Pointers to gallery, samples, documentation Windows and Linux (Ubuntu & CentOS) Build your first model in 30 minutes or less! Try the DSVM for Free today

6 Do you ever wish you had multiple laptops? APP Many individual tasks Many computers/vms Tasks are assigned to computers/vms Azure Batch - Cloud based HPC for embarrassingly parallel problems

7 Using Azure from R azuresmr - Manage and Interact with Azure Resources (e.g. virtual-machines, storage, Hadoop (Spark, Hive, Clusters) doazureparallel a parallel backend for starting and running embarrassingly parallel processing across a cluster in Azure. azureml - Interface with Azure Machine Learning Datasets, Experiments and publish and use R Web Services + Data access packages e.g. rodbc, dplyr, dplyrxdf etc.

8 AzureSMR Package Create & start new DSVM E.g. Windows DSVM one-box Template Library Link

9 AzureSMR Package Start an existing DSVM Start an existing DSVM

10 AzureSMR Package Connect to remote DSVM Multiple Ways to Connect and Execute R on the DSVM

11 AzureSMR Package Stop DSVM Stop a DSVM

12 foreach Parallelised for loop Used for parallelisation by many popular packages e.g. caret, plyr etc. Multiple parallel backends doazureparallel dorsr from Microsoft R (Spark, SQLServer, Teradata) iterators

13 doazure Parallel Use cases Parallelise this! Data Engineering Model Selection Hyper Parameter Tuning

14 doazureparallel Setup credentials/configuration One-time: Generate and complete credential and cluster parameters

15 doazureparallel Example Usage {continued} Set credentials, create cluster and register parallel backend ms.portal.azure.com Starting Started - Idle

16 doazureparallel Example Usage {continued} Run a simple foreach test to check parallel workers ms.portal.azure.com Idle Running Idle

17 doazureparallel - Example Usage {continued} Combined Model & Feature Selection

18 doazureparallel - Example Usage {continued} Combined Model & Feature Selection...

19 doazureparallel - Example Usage {continued} Once finished stop the cluster, to stop incurring costs ms.portal.azure.com Idle Resizing - 4 0

20 doazureparallel on Azure Batch Azure Batch is a platform service that provides easy job scheduling and cluster management, allowing applications or algorithms to run in parallel at scale. Capacity on demand; jobs on demand Autoscale resources using minimum and maximum node settings Minimal cluster management (node failure, install, etc) Hardware flexibility use any VM size including GPU instance types Pay by the minute - Cost effective no charge for using it, you only pay for the VMs Low priority VMs extremely cost effective for non-critical work Compute versus I/O intensive s GBs Azure Batch is a good fit if you want to run jobs using elastic compute. Other Azure alternatives: - Azure HDInsight Hadoop Spark cluster pre-configured with R, Rstudio Server etc, Big Data - Azure Data Lake Analytics - On-demand analytics jobs as a service SQL with R & Python

21 Performance example Create Mandelbrot Set Performance example scaling mandelbrot set: Flexible hardware options and sizes : General compute VMs (A-Series / D-Series) Memory / storage optimized (G-Series) Compute Optimized (F-Series) GPU enabled (N-Series) Local machine 5 parallel workers 10 parallel workers 20 parallel workers

22 Build your optimal model first Azureml :: Deploy as a web service in azureml

23 Getting Started / Questions Data Science Virtual Machine aka.ms/dsvm aka.ms/dsvmhandout aka.ms/dsvmtenthings Provisioning DSVM Azure Account - $200 free credit Demo & this Presentation How to find me. Simon Field Data Architect sifield@microsoft.com azuresmr AzureSMR Github site Azure R Quickstart Templates Azure Quickstart Templates (All) doazureparallel Blog - doazureparallel doazureparallel Github site azureml Getting Started - Vignette mrsdeploy Doc Site

24 THANK YOU!

25

26 Azure DSVM : Remote Access/Connectivity Browser to Jupyter Browser to Jupyter mrsdeploy remote session mrsdeploy remote session Visual Studio remote workspace mrsdeploy remote session mrsdeploy remote session Server remote desktop session mrsdeploy: REST API / Apps Windows DSVM ssh / xwindows mrsdeploy: REST API / Apps Linux DSVM

27 Package: Secret use to store keys/credentials

28 R Deployment Options : web-services mrsdeploy :: Deploy as a web service in Microsoft R - IAAS Build the optimal model first