Brian Macdonald Big Data & Analytics Specialist - Oracle

Similar documents
Starting with Oracle Data Science in the Cloud

Getting Started with Advanced Analytics in Finance, Marketing, and Operations

GSAW 2018 Machine Learning

SAP Predictive Analytics Suite

Copyr i g ht 2012, SAS Ins titut e Inc. All rights res er ve d. ENTERPRISE MINER: ANALYTICAL MODEL DEVELOPMENT

KnowledgeSTUDIO. Advanced Modeling for Better Decisions. Data Preparation, Data Profiling and Exploration

Analytics in Action transforming the way we use and consume information

Data Analytics for Semiconductor Manufacturing The MathWorks, Inc. 1

Oracle Spreadsheet Add-In for Predictive Analytics for Life Sciences Problems

Sunnie Chung. Cleveland State University

Machine Learning Data Management Platforms

IBM SPSS Modeler Personal

IBM SPSS & Apache Spark

SmartCare. SPSS Workshop. Rick Durham - North American Advanced Analytics Channel Team IBM Corporation. Date: 5/28/2014

Predictive HCM Using Machine Learning Data Management Platforms Move the Algorithms; Not the Data!

Data mining and Renewable energy. Cindi Thompson

2016 INFORMS International The Analytics Tool Kit: A Case Study with JMP Pro

What s New. Bernd Wiswedel KNIME KNIME AG. All Rights Reserved.

Drive Better Insights with Oracle Analytics Cloud

Starting Smart with Oracle Advanced Analytics. Heartland OUG Fall 2014

SAS Machine Learning and other Analytics: Trends and Roadmap. Sascha Schubert Sberbank 8 Sep 2017

IBM SPSS Modeler Personal

Azure ML Studio. Overview for Data Engineers & Data Scientists

Introducing Analytics with SAS Enterprise Miner. Matthew Stainer Business Analytics Consultant SAS Analytics & Innovation practice

Microsoft Azure Essentials

Datameer for Data Preparation: Empowering Your Business Analysts

USING R IN SAS ENTERPRISE MINER EDMONTON USER GROUP

This document (including, without limitation, any product roadmap or statement of direction data) illustrates the planned testing, release and

POST GRADUATE PROGRAM IN DATA SCIENCE & MACHINE LEARNING (PGPDM)

Copyright 2015, Oracle and/or its affiliates. All rights reserved.


Case Study: Oracle s Advanced Analytics at UK National Health Service

: What are examples of data science jobs?

KnowledgeENTERPRISE FAST TRACK YOUR ACCESS TO BIG DATA WITH ANGOSS ADVANCED ANALYTICS ON SPARK. Advanced Analytics on Spark BROCHURE

Data Analytics with MATLAB Adam Filion Application Engineer MathWorks

Achieve Better Insight and Prediction with Data Mining

DATA SCIENCE: HYPE AND REALITY PATRICK HALL

Oracle Big Data Discovery Cloud Service

Powered by Tech Mahindra MAKE IT BIG WITH BIG DATA ANALYTICS

Powered by. Tech Mahindra MAKE IT BIG WITH BIG DATA ANALYTICS

DATA ANALYTICS WITH R, EXCEL & TABLEAU

Data Science Training Course

From Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques. Full book available for purchase here.

2017 Predictive Analytics Symposium

Nouvelle Génération de l infrastructure Data Warehouse et d Analyses

Copyright 2015, Vlamis Software Solutions, Inc.

The Evolution of Big Data

Powered by. Tech Mahindra MAKE IT BIG WITH BIG DATA ANALYTICS

Data Analytics Training Program using

Big Data The Big Story

SAS Business Knowledge Series

Oracle Big Data Discovery The Visual Face of Big Data

Azure Offerings for Big data. In Kee Paek Cloud Data Solution Architect Microsoft Korea October. 2016

SAP Predictive Analytics Hands-On. Andreas Forster December 2015

MATLAB for Data Analytics The MathWorks, Inc. 1

Venkata Reddy Konasani

Advanced Job Daimler. Julian Leweling, Daimler AG

Certified Program in Data science

Bringing the Power of SAS to Hadoop Title

SAP Machine Learning for Hadoop. Customer

Databricks Cloud. A Primer

TDWI Analytics Fundamentals. Course Outline. Module One: Concepts of Analytics

IBM SPSS Modeler. Accelerate time to value with visual data science and machine learning. Highlights

Roles and Processes in Analytics Development

Deep Dive into High Performance Machine Learning Procedures. Tuba Islam, Analytics CoE, SAS UK

2015 The MathWorks, Inc. 1

Deloitte School of Analytics. Demystifying Data Science: Leveraging this phenomenon to drive your organisation forward

The Alpine Data Platform

EA103 Extend Your Analytics Capabilities on SAP HANA Using SAP Predictive Analysis Vishwanath Belur, Product Manager, SAP PI BIT Oct 2013

Schaffen von Kundenwerten

Guide to Modernize Your Enterprise Data Warehouse How to Migrate to a Hadoop-based Big Data Lake

Oracle 全数据平台解决方案 : 打破技术壁垒, 释放数据能量. Sally Piao 甲骨文公司全球研发副总裁

DASI: Analytics in Practice and Academic Analytics Preparation

Analytics for Banks. September 19, 2017

Mass-Scale, Automated Machine Learning and Model Deployment Using SAS Factory Miner and SAS Decision Manager

Big Data Analytics met Hadoop

BIG WITH BIG DATA ANALYTICS

Announcing: Release 7

Powered by Tech Mahindra MAKE IT BIG WITH BIG DATA ANALYTICS

How Data Science is Changing the Way Companies Do Business Colin White


Amsterdam. (technical) Updates & demonstration. Robert Voermans Governance architect

Business is being transformed by three trends

The real life of Data Scientists. or why we created the Data Science Studio

REDEFINE BIG DATA. Zvi Brunner CTO. Copyright 2015 EMC Corporation. All rights reserved.

COMP9321 Web Application Engineering

Data Science End to End

Transforming Analytics with Cloudera Data Science WorkBench

Azure Data Analytics & Machine Learning Seminar. Daire Cunningham: BI Practice Area Manager

Future Proof Your Career: What every executive needs to know about Adaptive Intelligence

KnowledgeSEEKER POWERFUL SEGMENTATION, STRATEGY DESIGN AND VISUALIZATION SOFTWARE

BIG WITH BIG DATA ANALYTICS

Predictive Analytics Reimagined for the Digital Enterprise

IBM SPSS Modeler Premium

PORTFOLIO AND TECHNOLOGY DIRECTION ARMISTEAD SAPP & RANDY GUARD

Big Data Cloud. Simple, Secure, Integrated and Performant Big Data Platform for the Cloud

PROVEN PRACTICES FOR PREDICTIVE MODELING

SAS BIG DATA ANALYTICS INCREASING YOUR COMPETITIVE EDGE

Hadoop Course Content

Oracle Using Oracle Advanced Analytics to Target the Right Customers with the Right Oracle Products

Transcription:

Brian Macdonald Big Data & Analytics Specialist - Oracle Improving Predictive Model Development Time with R and Oracle Big Data Discovery brian.macdonald@oracle.com Copyright 2015, Oracle and/or its affiliates. All rights reserved. 1

Agenda Impact of more data on the Data Scientist A day in the life of a Data Scientist Oracle Big Data Discovery Oracle R technologies

What are some of the key challenges for Data Scientists Or Analysts, Statisticians, whatever you want to call them. Data Lots of data. Everywhere Desire to leverage this data for predictive analytics Data scientist hiding in their office and sucking data for themselves. "Just give me the data Working with the Business Separate communities yet have the same objectives -- improve business They speak a different language How to share information? Usually not the math people. R?

Where is the data? Relational Hadoop The web Excel Files from data providers Other mysterious sources

How Oracle Brings it all Together End-to-End Solutions Data Fast Data Apps 1 2 3 Streams Events Actions Custom Data Management Packaged Reservoir Factory Warehouse Business Analytics Data Lab Reports Data Sets Discovery Data Science Visualization

Day in the life of Data Scientist Copyright 2015, Oracle and/or its affiliates. All rights reserved.

Let's focus on predictive analytics Data scientists have the tools SAS, SPSS, Matlab, R, Python, many others. We will discuss R The business has the domain expertise BI Tools, Excel Some cross over But generally they work independently

What if they can work together more seamlessly? I will take the data scientists perspective

What does a data scientist do? Define problem Find data Exploratory Data Analysis Transform Modeling Share results Deploy This consumes most of the time

Predictive Modeling with R Pros lots of sophisticated modeling capabilities Can transform merge data Robust EDA capabilities Extensible Free Cons It's a programming environment Need expertise Can be tedious for EDA and transformations Generally Limited to memory of laptop Doesn't help finding data Single threaded

Where will the data scientist start to solve a problem Manually ask/search around for data Generate statistics on data. dim(orcl), head(orcl), summary(orcl), skewness(orcl) Generate some graphs to visualize hist(orcl), plot(orcl), plot(lag_orcl) Start manipulating data log_orcl <- log(orcl), lag_orcl <- diff(orcl) All to see if the data is worth using.

R Demo Copyright 2015, Oracle and/or its affiliates. All rights reserved.

Powerful But can this be automated and made easier? 13

What commands come next? 14

More Code. Didn t I just do this for the other data sets? 15

Oracle Big Data Discovery Reducing the time for Data Science projects Search Explore transform discover deploy 16

How can Big Data Discovery help the data scientist Find data Show core statistics about all the data Automatic Visualization Transform the data i.e. bin, log transform, ratios, sentiment,. Place data in Hadoop to start modeling This is what Consumed lots of the Data Scientists Time What BDD doesn't do (yet) Predictive

Transform Business Users can help with this! Scrub PII Missing Data Entity Normalizations Outlier Detection Benefits Share Transformations In Real time Can be done before Data Scientist gets involved Outlier Elimination Centering & Scaling Features 18

Big Data Discovery Demo Copyright 2015, Oracle and/or its affiliates. All rights reserved.

The Data Scientist needs more though How can I do modeling if data is in Hadoop? Copy data to R? What of there is too much data? Use R to connect to Hadoop and run models on Hadoop Or Use R to connect to Oracle DB and run models on Oracle Or Use R Connected to Oracle DB with data on Hadoop

Oracle s R Technologies Oracle R Distribution ROracle Software available to R Community for free Oracle R Advanced Analytics for Hadoop A component of the Oracle Big Data Connectors software suite Oracle R Enterprise A component of the Oracle Advanced Analytics option to Oracle Database

Oracle R Advanced Analytics for Hadoop: Integration Using the Hadoop and HIVE Integration, plus R Engine and Open-Source R Packages Hadoop Cluster with Oracle R Advanced Analytics for Hadoop HQL Basic Statistics, Data Prep, Joins and View creation ORAAH distributed algorithms: MLP Neural Nets*, GLM*, LM PCA, k-means, NMF, LMF Open-source R packages via Map-Reduce * Spark-Caching enabled HQL R Oracle Database Server with Advanced Analytics option R Client R Analytics Oracle R Advanced Analytics for Hadoop SQL Client SQL Developer Other SQL Apps Copyright 2016 Oracle and/or its affiliates. All rights reserved. 22

Oracle R Advanced Analytics for Hadoop Advanced Analytics algorithms in a Hadoop Cluster: Map-Reduce and Spark based Classification Clustering Statistical Functions Generalized Linear Model Logistic Regression Hierarchical k-means Correlation Covariance Cross Tabulation Summary statistics Regression Linear Regression Multi-Layer Neural Networks Attribute Importance Principal Components Analysis Feature Extraction Nonnegative Matrix Fact(NMF) Collaborative Filtering (LMF)

Oracle R Advanced Analytics for Hadoop Demo Copyright 2015, Oracle and/or its affiliates. All rights reserved.

OAA with Big Data SQL: EXADATA + BDA Using the in-database algorithms, plus R Engine and Open-Source R Packages if desired Oracle BIG DATA APPLIANCE Oracle EXATADA with Advanced Analytics Option R Client R Analytics Oracle R Enterprise R SQL Client SQL Developer Other SQL Apps Big Data SQL 25

Oracle Advanced Analytics: in-database Machine Learning Using the in-database algorithms, plus R Engine and Open-Source R Packages if desired Oracle Database Server with Advanced Analytics Option SQL Basic Statistics and Joins Data Mining Predictive Analytics 15 PL/SQL In-Database algorithms R Client R Analytics Oracle R Enterprise ORE Parallel algorithms: MLP Neural, Stepwise, LM, GLM, PCA Access to open-source R packages R SQL Client SQL Developer Other SQL Apps 26

Oracle Advanced Analytics Predictive Analytics algorithms in-database Classification Logistic Regression Decision Trees Random Forests Naïve Bayes Support Vector Machines Clustering Hierarchical k-means Hierarchical O-Cluster Expectation-Maximization Regression Linear Regression Support Vector Machines Multi-Layer Neural Networks Random Forests Anomaly Detection One-Class SVM Association Rules Apriori Text Mining Tokenization Theme Extraction Attribute Importance Minimum Description Length Principal Components Analysis Feature Extraction Nonnegative Matrix Fact(NMF) Singular Value Decomposition(SVD) Copyright 2016 Oracle and/or its affiliates. All rights reserved. 27

Now that the Model is Build and Scored. What s Next? Provide Business Users Access to Scored Models Import scored data to BDD Share insights with Business Users Allow further Discovery and Analysis Operationalize the Model Business Users will provide guidance

Import Scored Data Demo Copyright 2015, Oracle and/or its affiliates. All rights reserved.

Besides How providing can technology Oracle help? Copyright 2015, Oracle and/or its affiliates. All rights reserved.

Oracle Analytics Assessment Engagement Process Business Case redevelopment (If needed) Socialization of results widely across business Data Science Business Process Improvement Data Summit Preparation and Discovery Workshop Business Case Analytics Sprints Analytics Assessment Deliverable Oracle led presentation to key stakeholders Output: Agreement to do DSS and target Usecase Value Led discovery Output: HYPOTHESIS backed by business case metrics, data sources, etc needed for Sprints Oracle Specialist-led Story Board & Demo development against HYPOTHESIS 31

Summary: Oracle Value Proposition for Data Scientists Increase Productivity Expand Capabilities Deploy Effectively HOW? Enabling Technology Partnership Model To: Deliver Business Process Improvement through Math and Data

OAA Links and Resources Oracle Advanced Analytics Overview: OAA presentation Big Data Analytics in Oracle Database 12c With Oracle Advanced Analytics & Big Data SQL Big Data Analytics with Oracle Advanced Analytics: Making Big Data and Analytics Simple white paper on OTN Oracle Internal OAA Product Management Wiki and Workspace YouTube recorded OAA Presentations and Demos: Oracle Advanced Analytics and Data Mining at the YouTube Movies (6 + OAA live Demos on ODM r 4.0 New Features, Retail, Fraud, Loyalty, Overview, etc.) Getting Started: Link to Getting Started w/ ODM blog entry Link to New OAA/Oracle Data Mining 2-Day Instructor Led Oracle University course. Link to OAA/Oracle Data Mining 4.0 Oracle by Examples (free) Tutorials on OTN Take a Free Test Drive of Oracle Advanced Analytics (Oracle Data Miner GUI) on the Amazon Cloud Link to OAA/Oracle R Enterprise (free) Tutorial Series on OTN Additional Resources: Oracle Advanced Analytics Option on OTN page OAA/Oracle Data Mining on OTN page, ODM Documentation & ODM Blog OAA/Oracle R Enterprise page on OTN page, ORE Documentation & ORE Blog Oracle SQL based Basic Statistical functions on OTN BIWA Summit 16, Jan 26-28, 2016 Oracle Big Data & Analytics User Conference @ Oracle HQ Conference Center

Books on Oracle Advanced Analytics & Big Data Books available on Amazon Predictive Analytics Using Oracle Data Miner: Develop for ODM in SQL & PL/SQL Using R to Unlock the Value of Big Data Oracle Big Data Handbook 34