CIS : Scalable Data Analysis

Similar documents
CHAPTER 1 UNDERSTANDING THE CLOUDCOMPUTING

Make the most of the cloud with Microsoft System Center and Azure

13.0 The Cloud The Cloud The Cloud The Cloud The Cloud 7/16/2010. Distributed Data Management

Lightweight Virtualization in Cloud Computing for Research

Cloud Computing. University of Economics and Law. Duc.NHM Faculty of Information Systems

IBM ICE (Innovation Centre for Education) Welcome to: Unit 1 Overview of delivery models in Cloud Computing. Copyright IBM Corporation

Windows Azure What, how and why. Martin Maripuu Net Group OÜ

Microsoft FastTrack For Azure Service Level Description

UForge AppCenter 3.8. Introduction March Copyright 2018 FUJITSU LIMITED

A Study on Services Provided by Various Service Providers of Cloud Computing

Tech Mahindra s Cloud Platform and PaaS Offering. Copyright 2015 Tech Mahindra. All rights reserved.

HyperCloud. IT s Cloud Dilemma

Designing Business Intelligence Solutions with Microsoft SQL Server 2014

CS 5220: VMs, containers, and clouds. David Bindel

Driving Greater ROI From ITSM with The Future of SAM. Martin Prendergast, CEO Concorde

AVANTUS TRAINING PTE LTD

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.

COURSE OUTLINE: Course 20533C- Implementing Microsoft Azure Infrastructure Solutions

Optimizing resource efficiency in Microsoft Azure

Azure vs. AWS. How to Decide Between Microsoft Azure and Amazon Web Services

Making Cloud Computing Work For You and Your Employer

Cloud Computing: What s Your Service? Introduction

Oracle Cloud Blueprint and Roadmap Service. 1 Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Data Center Operating System (DCOS) IBM Platform Solutions

Head of Enterprise Sales TW Stan Yao

Chapter 2 Basic Cloud Computing Types

Build a private PaaS. With Red Hat CloudForms and JBoss Enterprise Middleware. DLT Solutions 2411 Dulles Corner Park, Suite 800 Herndon, VA 20171

Innovate with Oracle Public Cloud Platform & Infrastructure Services

Accelerating Your Big Data Analytics. Jeff Healey, Director Product Marketing, HPE Vertica

BUILDING A PRIVATE CLOUD

WHAT IS CLOUD COMPUTING?

How is technology changing the water utility industry? SC Rural Water Conference Sept , 2015

"Charting the Course... MOC A: Architecting Microsoft Azure Solutions. Course Summary

The New Business Operating System. Combining Office 365 and the Microsoft Cloud Ecosystem to Create Business Value

HP Cloud Maps for rapid provisioning of infrastructure and applications

FUJITSU Software UForge AppCenter 3.8

Introduction to Cloud Computing

Better Insights with BISM in SQL Server Analysis Services 2012

Integrating MATLAB Analytics into Enterprise Applications

What s Happening to the Mainframe? Mobile? Social? Cloud? Big Data?

Oracle on Google Cloud Platform: Pitfalls, Real Options & Best Practices

Quick Reference Guide

Oracle Autonomous Data Warehouse Cloud

Datametica DAMA. The Modern Data Platform Enterprise Data Hub Implementations. What is happening with Hadoop Why is workload moving to Cloud

Oracle Big Data Cloud Service

SurPaaS Analyzer. Cut your application assessment. Visualize Your Cloud Options. Time by a factor of 10x and Cost by 75% Unique Features

How to sell Azure to SMB customers. Paul Bowkett Microsoft NZ

CLOUD COMPUTING- A NEW EDGE TO TECHNOLOGY

Learn How To Implement Cloud on System z. Delivering and optimizing private cloud on System z with Integrated Service Management

Position Description. Job Summary: Campus Job Scope:

Steve Bryant-Brown Technology Mayank Nayar Program Manager, Azure Site Recovery. Will Rowley Cloud

20775: Performing Data Engineering on Microsoft HD Insight

Datasheet FUJITSU Software UForge AppCenter 3.7

More information for FREE VS ENTERPRISE LICENCE :

Microsoft s Azure App Service Reduces Time and Cost for Web Application Development and Management

Six Keys to Microsoft Azure Migration Success

Compiere ERP Starter Kit. Prepared by Tenth Planet

ORACLE CLOUD MANAGEMENT PACK FOR MIDDLEWARE

Introducing FUJITSU Software Systemwalker Centric Manager V15.0

IBM Cloud Architecture and Strategy

<Insert Picture Here> Cloud Computing

Integrated Application-to-Disk Management

AWS vs. Google Cloud Pricing

Multi-Containers Orchestration with Live Migration and High-Availability for Microservices

Platform as a Service (PaaS) Demystified

SunGard: Cloud Provider Capabilities

Pricing Models and Pricing Schemes of IaaS Providers: A Comparison Study

Oracle Licensing in the Cloud. A Version Version 1 All Rights Reserved

Azure PaaS and SaaS Microsoft s two approaches to building IoT solutions

Server and Cloud Enrollment Frequently Asked Licensing Questions

Discover the New Company

Microsoft Big Data. Solution Brief

Tough Math for Desktop TCO

Dan Cohen Microsoft Consulting Services Microsoft Israel Team blog: My blog:

COMPANY PROFILE.

Application Performance Management for Cloud

Systems. Ramchander. Librarian, RPS Group of Institution, Balana/Mohinder Garh Abstract

License Mobility through Microsoft Software Assurance. Licensing guide

COMPUTE CLOUD SERVICE. Move to Your Private Data Center in the Cloud Zero CapEx. Predictable OpEx. Full Control.

Safe Harbor Statement

Oracle Autonomous Data Warehouse Cloud

Developing Oracle Fusion Middleware Applications in the Cloud

Microsoft Azure Essentials

KnowledgeENTERPRISE FAST TRACK YOUR ACCESS TO BIG DATA WITH ANGOSS ADVANCED ANALYTICS ON SPARK. Advanced Analytics on Spark BROCHURE

FUJITSU Technical Computing Solution UNCAI

Running Oracle EBS in the cloud. ANDREJS PROKOPJEVS Lead Applications Database Consultant

MIXING IT UP. Frank Wessels Infratects

New Technology: Mission Impossible?

: Integrating MDM and Cloud Services with System Center Configuration Manager

Understanding the Business Value of Docker Enterprise Edition

Lowering the TCO for HANA. Carl Streatfield November 26 th, 2013

Cloud Capacity Management

D5.1 Inter-Layer Cloud Stack Adaptation Summary

Achieve Competitive Advantage with IBM DevOps

Global Transfer Pricing Conference

[Header]: Demystifying Oracle Bare Metal Cloud Services

Cost Optimization for Cloud-Based Engineering Simulation Using ANSYS Enterprise Cloud

Software engineering efficient and flexible siemens.com/sicbs

Data Science: The Big #SQLServerUserGroupDubai

The final barrier to cloud adoption

Transcription:

CIS 602-01: Scalable Data Analysis Cloud Computing Dr. David Koop

Data Science Tasks TASKS (major involvement only) 49% CREATING VISUALIZATIONS 43% 43% FEATURE EXTRACTION 47% IDENTIFYING BUSINESS PROBLEMS TO BE SOLVED WITH ANALYTICS DEVELOPING PROTOTYPE MODELS 39% ORGANIZING AND GUIDING TEAM PROJECTS 36% IMPLEMENTING MODELS/ ALGORITHMS INTO PRODUCTION 32% COLLABORATING ON CODE PROJECTS (READING/EDITING OTHERS' CODE, USING GIT) 31% TEACHING/TRAINING OTHERS 30% PLANNING LARGE SOFTWARE PROJECTS OR DATA SYSTEMS 30% DEVELOPING DASHBOARDS 53% DATA CLEANING 58% COMMUNICATING FINDINGS TO BUSINESS DECISION-MAKERS 61% CONDUCTING DATA ANALYSIS TO ANSWER RESEARCH QUESTIONS 69% BASIC EXPLORATORY DATA ANALYSIS 5% 28% COMMUNICATING WITH PEOPLE OUTSIDE YOUR COMPANY 20% DEVELOPING DATA ANALYTICS SOFTWARE 19% 19% DEVELOPING PRODUCTS THAT DEPEND ON REAL-TIME DATA ANALYTICS USING DASHBOARDS AND SPREADSHEETS (MADE BY OTHERS) TO MAKE DECISIONS 29% ETL 24% SETTING UP / MAINTAINING DATA PLATFORMS DEVELOPING HARDWARE (OR WORKING ON SOFTWARE PROJECTS THAT REQUIRE EXPERT KNOWLEDGE OF HARDWARE) [O'Reilly] 2

Programming Languages PROGRAMMING LANGUAGES 9% 9% C++ 13% VISUAL BASIC / VBA 17% JAVASCRIPT 18% JAVA 24% BASH 54% PYTHON 57% R 70% SQL SHARE OF RESPONDENTS MATLAB 8% SCALA 8% C 8% C# 5% PERL 5% SAS 3% RUBY SALARY MEDIAN AND IQR (US DOLLARS) SQL R Python Bash Java JavaScript Visual Basic/VBA C++ Matlab Scala C C# Perl SAS Ruby Octave Go 2% OCTAVE 1% GO 0 50K 100K 150K 200K Range/Median Languages [O'Reilly] 3

Spreadsheet & Business Intelligence Tools SPREADSHEETS, BI, REPORTING 4% 5% SPOTFIRE ORACLE BI 6% COGNOS 6% BUSINESSOBJECTS 7% QLIKVIEW 8% POWER BI 10% POWERPIVOT SHARE OF RESPONDENTS 69% EXCEL 3% PENTAHO 3% ADOBE ANALYTICS 3% MICROSTRATEGY 2% ALTERYX 1% 1% DATAMEER JASPERSOFT SALARY MEDIAN AND IQR (US DOLLARS) Excel PowerPivot Power BI QlikView BusinessObjects Cognos Oracle BI Spotfire Pentaho Adobe Analytics Microstrategy Alteryx Jaspersoft Datameer 30K 60K 90K 120K 150K Range/Median Spreadsheets, BI, reporting [O'Reilly] 4

Trends in Data Analysis Tool Use [KDNuggets] 5

Python Tools matplotlib: - Python's "base" visualization library - Originally mimicked the functions in matlab - Integrated with pandas.plot calls - Jupyter Notebook: %matplotlib inline or notebook - seaborn adds extras on top of matplotlib scikit-learn - Machine learning library - Fit and predict using models - Classification and regression 6

Reading Response Due Thursday (10/12) before class Read Reiss et al. Short Summary + Critique Questions: - Do you agree with the conclusions? - What explanations could there be for the results? - Does the paper suggest improvements in cloud computing based on workload analysis? 7

Assignment 2 http://www.cis.umassd.edu/~dkoop/ cis602-2017fa/assignment2.html New York City Trees - 680,000+ trees - Use WebGL for visualization - Use a Python bridge (mapboxgl) - Use the fork! - Smaller data versions available - Keep using pandas - Label subproblems and answers - Due Thursday, October 19 8

Scaling Up PC [Haeberlen and Ives, 2015] D. Koop, CIS 602-02, Fall 2015 9

Scaling Up PC Server [Haeberlen and Ives, 2015] D. Koop, CIS 602-02, Fall 2015 9

Scaling Up PC Server Cluster [Haeberlen and Ives, 2015] D. Koop, CIS 602-02, Fall 2015 9

Scaling Up PC Server Cluster Data center [Haeberlen and Ives, 2015] D. Koop, CIS 602-02, Fall 2015 9

Scaling Up PC Server Cluster Data center Network of data centers [Haeberlen and Ives, 2015] D. Koop, CIS 602-02, Fall 2015 9

Scale Problems 1.Difficult to dimension - Load can vary considerably - Waste resources of lose customers 2.Expensive - Hardware costs - Personnel costs - Maintenance costs 3.Difficult to scale - scaling up (new machines, new buildings) - scaling down (energy, fixed costs) [Haeberlen and Ives, 2015] D. Koop, CIS 602-02, Fall 2015 10

Power Plant to Cloud Analogy Power source Power source Power source directly connected Network Meter Customer [Haeberlen and Ives, 2015] D. Koop, CIS 602-02, Fall 2015 11

Cloud Computing: State-of-the-art and Research Challenges Q. Zhang, L. Cheng, and R. Boutaba

Cloud Computing Features No up-front investment Lowering operating cost Highly scalable Easy access Reducing business risks and maintenance expenses 13

Everything as a Service Software as a service (SaaS) [Restaurant] Platform as a service (PaaS) [Take-out food] Infrastructure as a service (Iaas) [Grocery] User Application Middleware Hardware Cloud provider [Haeberlen and Ives, 2015] D. Koop, CIS 602-02, Fall 2015 14

Cloud Computing Architecture [Zhang et al., 2010] 15

Business Model IaaS and PaaS often part of the same organization Businesses leveraging subscription models to support cloud services [Zhang et al., 2010] 16

Types of Clouds Public: - Everyone can use - Lack fine-grained control over data, network, security settings Private (aka internal): - May be homegrown or external - Offer more control - Require up-front capital Hybrid: - Keep parts secure (e.g. accounting, finances) - Other pieces can leverage public cloud resources - Have to make determination of what goes where 17

Data Center Network Infrastructure [Zhang et al., 2010] 18

Virtualization Bob Virtual machine monitor Charlie Alice Physical machine Daniel Hypervisor controls access to resources Flexibility for provider Secure and isolated Performance may be hard to predict D. Koop, CIS 602-02, Fall 2015 [Haeberlen and Ives, 2015] 19

Cloud Comparison Cloud Provider Amazon EC2 Windows Azure Google App Engine Classes of Utility Computing Infrastructure service Platform service Platform service Target Applications General-purpose applications General-purpose Windows applications Traditional web applications with supported framework Computation OS Level on a Xen Virtual Machine Microsoft Common Language Runtime (CLR) VM; Predefined roles of app. instances Predefined web application frameworks Storage Elastic Block Store; Amazon Simple Storage Service (S3); Amazon SimpleDB Azure storage service and SQL Data Services BigTable and MegaStore Auto Scaling Automatically changing the number of instances based on parameters that users specify Automatic scaling based on application roles and a configuration file specified by users Automatic Scaling which is transparent to users 20

Cloud Challenges and Opportunities Availability: What happens to my business if there is an outage in the cloud? Data lock-in: How do I move my data from one cloud to another? Data confidentiality and auditability: How do I make sure that the cloud doesn't leak my confidential data? Data transfer bottlenecks: How do I copy large amounts of data from/to the cloud? Performance unpredictability: VMs sharing the same disk [Haeberlen and Ives, 2015] D. Koop, CIS 602-02, Fall 2015 21

Cloud Challenges and Opportunities Scalable storage: Cloud model (short-term usage, no up-front cost, infinite capacity on demand) does not fit persistent storage well Bugs in large distributed systems: Many errors cannot be reproduced in smaller configs Scaling quickly: Dealing with boot times and idle power Reputation fate sharing: One customer's bad behavior can affect the reputation of others using the same cloud (e.g. spam, FBI raids) Software licensing: Licenses tied to computers, how to scale? D. Koop, CIS 602-02, Fall 2015 22

Cloud Comparer 23

Cloud Adoption 24

Cloud Strategy 25

Cloud Tools 26

Cloud Challenges 27

Cloud Initiatives 28

Public Cloud Adoption 29

Public Cloud Change 30

Private Cloud Adoption 31

Wasted Cloud Spending 32

Amazon Web Services 101 I. Massingham

Reading Response Due Thursday (10/12) before class Read Reiss et al. Short Summary + Critique Questions: - Do you agree with the conclusions? - What explanations could there be for the results? - Does the paper suggest improvements in cloud computing based on workload analysis? 34