Azure ML Studio. Overview for Data Engineers & Data Scientists

Size: px
Start display at page:

Download "Azure ML Studio. Overview for Data Engineers & Data Scientists"

Transcription

1 Azure ML Studio Overview for Data Engineers & Data Scientists

2 Rakesh Soni, Big Data Practice Director Randi R. Ludwig, Ph.D., Data Scientist Daniel Lai, Data Scientist

3 Intersys Company Summary Overview Privately held IT services firm 150+ consultants spanning the full IT space Leverage a local, national and/or global model as appropriate for each customer and engagement Key Industries Retail and Hospitality Financial Services Healthcare High Tech Manufacturing Media and Advertising Core Values Be Accountable Bring Excellence Be Authentic Be in Service to Others Phoenix Austin NYC 3

4 Core Practice Capabilities Big Data & Analytics Big Data Analytics & Data Science Enterprise Search Business Intelligence Information Management Application Services Application Development Modern Web Mobility Cloud DevOps / Agile Technology Staffing Infrastructure Project Management Packaged Solutions Quality Assurance Cloud Project & Program Management Quality Assurance Assessment Strategy & Roadmap 4

5 Agenda Intersys Overview Machine Learning Azure Machine Learning Studio Main Features of Azure Machine Learning Studio Demo 1 Predicting Income Category Demo 2 Predicting Patient Readmission Q&A 5

6 Machine Learning What Is It?

7 Machine Learning All Around Us Credit: Google Deep Mind, Strategic Game Play Reinforced learning to take actions with the highest reward Credit: BBC News, 7 Autonomous Technology Training computers to be make intelligent and intuitive decisions

8 Machine Learning in Real Life Optimize Business Decisions Real time insights into customer behavior Credit: UiBS Microsoft Partner 8

9 Machine Learning How Does it Work? Credit: Google Research, "Machine Learning 101" Reported at: Data: examples used to train and validate model Model: the system that makes predictions or classifications Parameters: the signals or factors used by the model to form its decisions Learner: the system that adjusts the parameters and in turn the model by looking at differences in predictions versus actual outcome. 9

10 Azure Machine Learning Studio

11 Azure ML Studio What is it? Collaborative, drag-anddrop, fully managed cloud platform Build, test, and deploy predictive analytics Publish models as web services to be consumed by custom apps or BI tools. 11

12 Azure ML Studio Public Gallery Don t want to build a machine learning model from scratch? Avoid reinventing the wheel by browsing the public gallery for existing models that meet your needs. Add picture of gallery 12

13 Azure ML Studio Workspace To have full control, open your own Experiment. For any part of the process you can use built in features. Find these via the search bar or browse the categories shown. Once you ve found the module you need, just drag and drop onto the workspace. 13

14 Data Import And Export Options Bring in data in many forms. Export wherever you need it. 14

15 Easy To Create And Visualize Workflow Easy to follow workflow for creating and sharing. As you re creating your model, you can see dependencies in your process. When sharing your results, the workflow tells your story. 15

16 Dataset Statistics At Every Step Includes quick column statistics on your whole dataset Once your data is loaded into Azure ML, you can explore overall distributions of each feature. You can also quickly identify columns with many missing values. 16

17 Algorithm Selection Accuracy Training time Linearity Parameters Logistic Regression 5 Support Vector Machine * 5 Neural Network 9 Boosted decision tree 6 Decision forest 6 17 Classification Regression Accuracy Training time Linearity Parameters Logistic Regression 4 Bayesian Linear Regression 2 Neural Network 9 Boosted decision tree 5 Decision forest 6 Anomaly Detection Accuracy Training time Linearity Parameters Principal Component Analysis 3 K-means Clustering 4 Good performance Moderate performance * Good for large feature sets Large memory footprint Credit: Brandon Rohrer, azure.microsoft.com

18 Publish Output As A Web Service and BI Visualization 18

19 Azure Machine Learning Demo

20 Demo 1 - Predicting Income Category Azure ML Tour 20

21 Demo 2 Predicting Patient Readmission 21

22 Azure Machine Learning Studio Summary

23 Azure ML Summary Interactive & visual workspace Various data sources supported: SQL Server, HIVE tables, CSV file etc. Dataset statistics and easy exploration Data cleaning & transformation Many modeling algorithms included out of the box SQL/R/Python code can be included in workflow Many built-in ways to evaluate and compare models using standard performance metrics 23

24 Resources Microsoft's provided documentation is quite thorough and helpful: 24

25 25 Thank you. Any questions?

26 Q&A Please submit your questions into the chat field.

27 Demo Resource Slides

28 Demo 1: This is model that predicts whether a person earns > $50k. Here we show how to build a model from scratch, including training the model, predicting values for the test set, and evaluating results. 28

29 Input your data 29

30 Summary statistics 30

31 How to Clean Missing Data 31

32 How to Limit the Number of Features in Your Model 32

33 Splitting into Training/Test Sets 33

34 Choosing a Model 34

35 Training the Model 35

36 Which Features are Most Predictive? 36

37 Which Features are Most Predictive? 37

38 Predicting Values for Test Data 38

39 Check Predictions (Scored Labels) 39

40 Evaluate Model 40

41 Check Model Accuracy, Etc. 41

42 End of Demo 1 42

43 Demo 2: This model predicts whether a patient will be readmitted to a hospital for further treatment. This model considers a variety of strong machine learning algorithms. It then tunes the strongest model to be more efficient, evaluation of prediction results, and implements custom code in from R, Python, and SQL for further visualization and examination of data. 43

44 Explore the Dataset 44

45 Impute Missing Values 45

46 Compare Models: Decision Jungle 46

47 Compare Models: Boosted Decision Tree 47

48 Compare Models: Logistic Regression 48

49 Compare Models: Neural Network 49

50 Cross Validation Results: Variation in Accuracy for All Algorithms 50

51 ROC Chart comparison for models 51

52 Move Forward with Most Accurate Model 52

53 How to Cross Validate a Tuned Model 53

54 Assign Folds 54

55 Set Model Parameter Ranges 55

56 Tune Model Parameters 56

57 Use Model to Predict Values for Test Set 57

58 Visualize Predicted Values 58

59 Permute Features to Find Most Important 59

60 Find Features Most Influential to the Model 60

61 Can Export Results for Multiple Uses 61

62 Export CSV for debugging in notebooks 62

63 Sample R Notebook 63

64 SQL scripts 64

65 65

66 Find optimal cutoff in R script 66

67 R result 67

68 Python script for further visualizations 68

69 Python Visualization 69

70 Evaluate Final Model 70

71 Results 71