Cloud Based Big Data Analytic: A Review

Size: px
Start display at page:

Download "Cloud Based Big Data Analytic: A Review"

Transcription

1 International Journal of Cloud-Computing and Super-Computing Vol. 3, No. 1, (2016), pp Cloud Based Big Data Analytic: A Review A.S. Manekar 1, and G. Pradeepini 2 1 Research Scholar, K L University, Vijaywada, A. P 2 Department of CSE, KL University, Vijaywada, A.P 1 asmanekar24@gmail.com, 2 pradeepini.gera@gmail.com Abstract Cloud computing is a complex architecture of sharing computing resources with a advancement of applications. In short cloud means The Internet. Computing industry already started using of cloud infrastructure for the advancement of business. In all sectors and big players already transform their business on cloud based infrastructure and applications. These cloud based infrastructure and applications generates huge amount of data. This huge amount of data is useful for future prediction. Emerging applications like online shopping, weather forecasting, social media sites and many more are depended on big data analytics for future predictions. Many researchers and industry are finding solution to club big data analytics over cloud. Big data is a term which deals with large volume, velocity and variety of data. In this paper we have taken review of several big data processing techniques from system and application aspects. Also some of the challenges are discuss for the future work in this area with respective today s technology is discuss. The main focus is on key issues like big data processing, cloud computing platform, cloud infrastructure and and resources, data management in cloud and accessing methods of these data bases. Overall security is a prime issue in all respect while transforming big data analytics to cloud. Finally some discussion on current issue and challenges with solution of MapReduce parallel processing and clubbing the distributed data processing into cloud based infrastructure is carried out in conclusion section with the future research directions on big data processing in cloud infrastructure. Keywords: Big Data, Cloud Computing, Cost Minimization, Progressive analytics, Hadoop, Map reduce, HDFS. 1. Introduction A general alternative of computer processing, storage, and software delivery from terminals and servers all through high-speed backbone network also finally terminates to next generations data centers is a today s cloud environment. All data mining applications have potential for deploying or transforming them into cloud. All enterprise have huge investment in software and hardware. If this massive amount of enterprise data transferred into cloud the company s especially small scale or small investment companies or business can adopt the pay-as-you-go cloud computing model. PAYG (Pay as you go) architecture is a architecture where cloud will be act as a ASP (Application Service Provider) [1]. If all data mining and massive amount of data generating enterprises adopt this architecture the data bases will be act as services (DaaS) Database as a Service. This new parading in cloud computing many Article history: Received (February 15, 2016), Review Result (April 20, 2016), Accepted (May 18, 2016) ISSN: IJCS Copyright c 2016 GV School Publication

2 Cloud Based Big Data Analytic: A Review vendors typically maintain hardware and with this hardware they provide customer a virtual machine in which to install their own software. This kind of elastic availability of resources with infinite amount of processing power and storage available on demand have pay only what you used pricing model [1]. For producers, on the other hand, the cloud is about the technology that goes into providing service offerings at each level. Big data where volume, velocity and Variety of data are very huge is a first side of coin whereas on second sit have value also. We can say that we are in data deluge era when thinks for all these V s i.e. volume variety and velocity with respect to Value. Data is a record of what happened, Algorithms that make pricing, ad targeting, inventory management, and fraud detection all fabricate data about their own performance that advance their own performance [7]. 2. Literature review Researchers have witness several advancement computational era which ultimately transform us into high Performance Computing world. Eli ollins from cloudera in his article Intersection ofthe Cloud andbig Data explain that adaptation of macro trends in cloud, there are several macro trends in cloud. he first explain as consumption, e consumed data as a part of daily activity every time. Second trend is instrumentationwe collect data at each step in manyof our activities, and much of it is now produced bymachines instead of people. The third trend is exploration. The relativelyeasy access to this abundance of data means we canuse it to construct, test, and consume experimentsthat were previously not feasible. Actually big data trends also plays important role in this trend e.g. recent advantage in Apache hadoop eco-system. Stephen J. Andriole and Irena Bojanova focus on revenue generation will be further supported by developments in interoperability between clouds, allowing companies to scale a service across disparate providers, while the service appears to operate as one system. Cloud federation will also support revenue generation by interconnecting cloud services of different providers and from disparate networks. IaaS (Internet as a Service) cloud providers offer computation and storage resources to third parties [9]., if developers enhanced the and allow customer to deploy VM s based on predefined virtual images, as well as persistent storage devices with this additional support of providing computing ands storage as a service, providers of key management, although these do not provide functional building blocks for setting new era of big data processing data centers, Some attempts have been done at setting up Hadoop in the cloud (see [5] and OpenStack Sahara). Main techniques for data crunching were to move the data to the computational nodes in shared architecture [2]. There are also some systems that have the goal of allowing cluster sharing between different applications, improving cluster utilization and avoiding perframework data replication like Mesos [3] and YARN (Yet-Another-Resource-Negotiator) [4].The task of data loading is a primary task and most critical task for developing or migrating the big data in to cloud, this task involves many steps like partitioning, data distribution, application configuration, load data into memory [6]. 3. Methods and techniques In this sections primary focus on techniques use in migration of big data in cloud environment is discuss and later section discuss the analytical techniques available for big data analytics. For migration of huge data into cloud basically requires different techniques likes partitioning, data distribution, application configuration, load data into memory, which is explain below. 8 A.S. Manekar and G. Pradeepini

3 International Journal of Cloud-Computing and Super-Computing Vol. 3, No. 1, (2016), pp.7-12 Partitioning: The data set is split and assigned to the workers, so that data processing can occur in parallel. Data distribution: Data is distributed to the VM where it is going to be processed. Application configuration: The VMs have the big data applications correctly configured. Load data in memory: In some computing models, during job preparation, the data must be loaded from the hard disk to RAM [1]. After all these processing these data can be transferred for big data analytics which can be used for analytical purpose. Hadoop is a open source techniques with HDFS hadoop distribute file system can be used for analytical purpose as shown in the fig. 1 which can be transferred big data into cloud. Figure 1. Towards cloud migration of big data Big Data and Cloud, two of the trends that are essential the up-and-coming Enterprise Computing, show a lot of potential for a new era of combined applications. The provision of Big Data analytical capabilities using cloud delivery models could ease adoption for many companies, and in addition to important cost savings, it could simplify useful insights that could provide them with different kinds of competitive advantage. Fig 2 is described how big data is transformed in to cloud. Hadoop is an open source free, Java-based programming framework that wires the giving out of large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation. Copyright c 2016 GV School Publication 9

4 Cloud Based Big Data Analytic: A Review Figure 2. Big data in cloud environment - cloud as a service MapReduce - Classic big data applications involve using the MapReduceabstraction for crunching different data sources (e.g. log files). MapReduce is a programming model and an associated implementation for processing and generating large data sets with a parallel, distributed algorithm on a cluster. HadoopMapReduce is a distributed framework together work on cluster of commodity hardware. The main task of MapReduce is scheduling, monitoring the task and rescheduled if the task s failed. Hadoop Distributed File SystemTheHadoop Distributed File System (HDFS) is a subproject of the Apache Hadoop project. This Apache Software Foundation project is designed to provide a fault-tolerant file system designed to run on commodity hardware. HDFS is basically scalable, fault-tolerant, distributed storage system. It works with MapReduce in distributed environment very closely. 4. Discussion More formal for this review work making conclusion is very hard and hence forth I would like to discuss some point, cloud computing rapidly become a new computation paradigm in processing and performing operations on big data. The current practice is to copy the data into large hard drives for physical repository and transport this data by physical transportation or migration to the data centers or any other location where data get process. Sometimes we need to transfer the entire machines and system. The challenges may be escalated when we consider different solution of transforming the data which may or may not be progressive and generating from different locations. With prime solutions of Hadoop, MapReduce and HDFS we can build a system which can be migrate this data processing on cloud in nearby futures. 10 A.S. Manekar and G. Pradeepini

5 International Journal of Cloud-Computing and Super-Computing Vol. 3, No. 1, (2016), pp.7-12 References [1] D.J. Abadi, Data Management in Cloud: Limitations and Opportunities, Bulletin of the IEE Computer Technical Committee on Data Engineering, pp. 1-10, (2009). [2] I. Foster and C. Kesselman, The Grid 2: Blueprint for a New Computing Infrastructure, San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., (2003). [3] B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A.D. Joseph,R. Katz, S. Shenker, and I. Stoica, Mesos: A Platform for Fine-grained Resource Sharing in the Data Center, in Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation, ser.nsdi 11, USENIX Association, pp , (2011), Berkeley, CA, USA. Available: ttp://dl.acm.org/citation.cfm?id= [4] V.K. Vavilapalli, A.C. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth, B. Saha, C. Curino, O. O Malley, S. Radia, B. Reed, and E. Baldeschwieler, Apache HadoopYarn: Yet Another Resource Negotiator, in Proceedings of the 4th Annual Symposium on Cloud Computing, ser. SOCC 13, ACM, pp. 5:1-5:16, (2013), New York, NY, USA. [5] S. Loughran, J. AlcarazCalero, A. Farrell, J. Kirschnick, and J. Guijarro, Dynamic Cloud Deployment of a Mapreduce Architecture, Internet Computing, IEEE, Vol. 16, No. 6, pp , Nov.(2012). [6] L. Vaquero and F. Cuadrado, Deploying Large-Scale Data Sets on-demand in the Cloud: Treats and Tricks on Data Distribution, Transactions on Cloud Computing, Vol. Aa, No. B, (2014) [7] [8] Z. Zeng, B. Wu, and H. Wang, A Parallel Graph Partitioning Algorithm to Speed up the Large-scale Distributed Graph Mining, in Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, ser. BigMine 12, ACM, pp , (2012), New York, NY, USA. [9] L.M. Vaquero, F. Cuadrado, and M. Ripeanu, Systems for Near Real-time Analysis of Large-scale Dynamic Graphs, in Submitted for publication, ser. xxx 14, (2014). [10] L.M. Vaquero, L. Rodero-Merino, J. Caceres, and M. Lindner, A Break in the Clouds: Towards a CloudDefinition, SIGCOMM Computer Communication Review, Vol. 39, No. 1, pp , (2008). [11] Official sites of hadoop - Copyright c 2016 GV School Publication 11

6 Cloud Based Big Data Analytic: A Review 12 A.S. Manekar and G. Pradeepini

Resource Scheduling Architectural Evolution at Scale and Distributed Scheduler Load Simulator

Resource Scheduling Architectural Evolution at Scale and Distributed Scheduler Load Simulator Resource Scheduling Architectural Evolution at Scale and Distributed Scheduler Load Simulator Renyu Yang Supported by Collaborated 863 and 973 Program Resource Scheduling Problems 2 Challenges at Scale

More information

A Preemptive Fair Scheduler Policy for Disco MapReduce Framework

A Preemptive Fair Scheduler Policy for Disco MapReduce Framework A Preemptive Fair Scheduler Policy for Disco MapReduce Framework Augusto Souza 1, Islene Garcia 1 1 Instituto de Computação Universidade Estadual de Campinas Campinas, SP Brasil augusto.souza@students.ic.unicamp.br,

More information

A Hybrid Scheduling Approach for Scalable Heterogeneous Hadoop Systems

A Hybrid Scheduling Approach for Scalable Heterogeneous Hadoop Systems A Hybrid Scheduling Approach for Scalable Heterogeneous Hadoop Systems Aysan Rasooli Department of Computing and Software McMaster University Hamilton, Canada Email: rasooa@mcmaster.ca Douglas G. Down

More information

Spark, Hadoop, and Friends

Spark, Hadoop, and Friends Spark, Hadoop, and Friends (and the Zeppelin Notebook) Douglas Eadline Jan 4, 2017 NJIT Presenter Douglas Eadline deadline@basement-supercomputing.com @thedeadline HPC/Hadoop Consultant/Writer http://www.basement-supercomputing.com

More information

5th Annual. Cloudera, Inc. All rights reserved.

5th Annual. Cloudera, Inc. All rights reserved. 5th Annual 1 The Essentials of Apache Hadoop The What, Why and How to Meet Agency Objectives Sarah Sproehnle, Vice President, Customer Success 2 Introduction 3 What is Apache Hadoop? Hadoop is a software

More information

A Novel Multilevel Queue based Performance Analysis of Hadoop Job Schedulers

A Novel Multilevel Queue based Performance Analysis of Hadoop Job Schedulers Indian Journal of Science and Technology, Vol 9(44), DOI: 10.17485/ijst/2016/v9i44/96414, November 2016 ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645 A Novel Multilevel Queue based Analysis of Hadoop

More information

Data Management in the Cloud CISC 878. Patrick Martin Goodwin 630

Data Management in the Cloud CISC 878. Patrick Martin Goodwin 630 Data Management in the Cloud CISC 878 Patrick Martin Goodwin 630 martin@cs.queensu.ca Learning Objectives Students should understand the motivation for, and the costs/benefits of, cloud computing. Students

More information

E-guide Hadoop Big Data Platforms Buyer s Guide part 1

E-guide Hadoop Big Data Platforms Buyer s Guide part 1 Hadoop Big Data Platforms Buyer s Guide part 1 Your expert guide to Hadoop big data platforms for managing big data David Loshin, Knowledge Integrity Inc. Companies of all sizes can use Hadoop, as vendors

More information

Simplifying Hadoop. Sponsored by. July >> Computing View Point

Simplifying Hadoop. Sponsored by. July >> Computing View Point Sponsored by >> Computing View Point Simplifying Hadoop July 2013 The gap between the potential power of Hadoop and the technical difficulties in its implementation are narrowing and about time too Contents

More information

Intro to Big Data and Hadoop

Intro to Big Data and Hadoop Intro to Big and Hadoop Portions copyright 2001 SAS Institute Inc., Cary, NC, USA. All Rights Reserved. Reproduced with permission of SAS Institute Inc., Cary, NC, USA. SAS Institute Inc. makes no warranties

More information

Research of the Social Media Data Analyzing Platform Based on Cloud Mining Yi-Tang ZENG, Yu-Feng ZHANG, Sheng CAO, Li LI, Cheng-Wei ZHANG *

Research of the Social Media Data Analyzing Platform Based on Cloud Mining Yi-Tang ZENG, Yu-Feng ZHANG, Sheng CAO, Li LI, Cheng-Wei ZHANG * 2016 3 rd International Conference on Social Science (ICSS 2016) ISBN: 978-1-60595-410-3 Research of the Social Media Data Analyzing Platform Based on Cloud Mining Yi-Tang ZENG, Yu-Feng ZHANG, Sheng CAO,

More information

StackIQ Enterprise Data Reference Architecture

StackIQ Enterprise Data Reference Architecture WHITE PAPER StackIQ Enterprise Data Reference Architecture StackIQ and Hortonworks worked together to Bring You World-class Reference Configurations for Apache Hadoop Clusters. Abstract Contents The Need

More information

Achieving Agility and Flexibility in Big Data Analytics with the Urika -GX Agile Analytics Platform

Achieving Agility and Flexibility in Big Data Analytics with the Urika -GX Agile Analytics Platform Achieving Agility and Flexibility in Big Data Analytics with the Urika -GX Agile Analytics Platform Analytics R&D and Product Management Document Version 1 WP-Urika-GX-Big-Data-Analytics-0217 www.cray.com

More information

ENABLING GLOBAL HADOOP WITH DELL EMC S ELASTIC CLOUD STORAGE (ECS)

ENABLING GLOBAL HADOOP WITH DELL EMC S ELASTIC CLOUD STORAGE (ECS) ENABLING GLOBAL HADOOP WITH DELL EMC S ELASTIC CLOUD STORAGE (ECS) Hadoop Storage-as-a-Service ABSTRACT This White Paper illustrates how Dell EMC Elastic Cloud Storage (ECS ) can be used to streamline

More information

HPC IN THE CLOUD Sean O Brien, Jeffrey Gumpf, Brian Kucic and Michael Senizaiz

HPC IN THE CLOUD Sean O Brien, Jeffrey Gumpf, Brian Kucic and Michael Senizaiz HPC IN THE CLOUD Sean O Brien, Jeffrey Gumpf, Brian Kucic and Michael Senizaiz Internet2, Case Western Reserve University, R Systems NA, Inc 2015 Internet2 HPC in the Cloud CONTENTS Introductions and Background

More information

Big Data The Big Story

Big Data The Big Story Big Data The Big Story Jean-Pierre Dijcks Big Data Product Mangement 1 Agenda What is Big Data? Architecting Big Data Building Big Data Solutions Oracle Big Data Appliance and Big Data Connectors Customer

More information

CONVERGENCE OF CLOUD COMPUTING, SERVICE ORIENTED ARCHITECTURE AND ENTERPRISE ARCHITECTURE

CONVERGENCE OF CLOUD COMPUTING, SERVICE ORIENTED ARCHITECTURE AND ENTERPRISE ARCHITECTURE CONVERGENCE OF CLOUD COMPUTING, SERVICE ORIENTED ARCHITECTURE AND ENTERPRISE ARCHITECTURE Susan Sutherland (nee Rao) University of Canberra PO Box 148, Jamison Centre, ACT 2614, Australia Susan.sutherland@canberra.edu.au

More information

Engaging in Big Data Transformation in the GCC

Engaging in Big Data Transformation in the GCC Sponsored by: IBM Author: Megha Kumar December 2015 Engaging in Big Data Transformation in the GCC IDC Opinion In a rapidly evolving IT ecosystem, "transformation" and in some cases "disruption" is changing

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Unit # 19 1 Acknowledgement The following discussion is based on the paper Mining Big Data: Current Status, and Forecast to the Future by Fan and Bifet and online presentation

More information

Adobe Deploys Hadoop as a Service on VMware vsphere

Adobe Deploys Hadoop as a Service on VMware vsphere Adobe Deploys Hadoop as a Service A TECHNICAL CASE STUDY APRIL 2015 Table of Contents A Technical Case Study.... 3 Background... 3 Why Virtualize Hadoop on vsphere?.... 3 The Adobe Marketing Cloud and

More information

Got Hadoop? Whitepaper: Hadoop and EXASOL - a perfect combination for processing, storing and analyzing big data volumes

Got Hadoop? Whitepaper: Hadoop and EXASOL - a perfect combination for processing, storing and analyzing big data volumes Got Hadoop? Whitepaper: Hadoop and EXASOL - a perfect combination for processing, storing and analyzing big data volumes Contents Introduction...3 Hadoop s humble beginnings...4 The benefits of Hadoop...5

More information

Introduction to Big Data(Hadoop) Eco-System The Modern Data Platform for Innovation and Business Transformation

Introduction to Big Data(Hadoop) Eco-System The Modern Data Platform for Innovation and Business Transformation Introduction to Big Data(Hadoop) Eco-System The Modern Data Platform for Innovation and Business Transformation Roger Ding Cloudera February 3rd, 2018 1 Agenda Hadoop History Introduction to Apache Hadoop

More information

BIG DATA PROCESSING A DEEP DIVE IN HADOOP/SPARK & AZURE SQL DW

BIG DATA PROCESSING A DEEP DIVE IN HADOOP/SPARK & AZURE SQL DW BIG DATA PROCESSING A DEEP DIVE IN HADOOP/SPARK & AZURE SQL DW TOPICS COVERED 1 2 Fundamentals of Big Data Platforms Major Big Data Tools Scaling Up vs. Out SCALE UP (SMP) SCALE OUT (MPP) + (n) Upgrade

More information

KnowledgeENTERPRISE FAST TRACK YOUR ACCESS TO BIG DATA WITH ANGOSS ADVANCED ANALYTICS ON SPARK. Advanced Analytics on Spark BROCHURE

KnowledgeENTERPRISE FAST TRACK YOUR ACCESS TO BIG DATA WITH ANGOSS ADVANCED ANALYTICS ON SPARK. Advanced Analytics on Spark BROCHURE FAST TRACK YOUR ACCESS TO BIG DATA WITH ANGOSS ADVANCED ANALYTICS ON SPARK Are you drowning in Big Data? Do you lack access to your data? Are you having a hard time managing Big Data processing requirements?

More information

From Information to Insight: The Big Value of Big Data. Faire Ann Co Marketing Manager, Information Management Software, ASEAN

From Information to Insight: The Big Value of Big Data. Faire Ann Co Marketing Manager, Information Management Software, ASEAN From Information to Insight: The Big Value of Big Data Faire Ann Co Marketing Manager, Information Management Software, ASEAN The World is Changing and Becoming More INSTRUMENTED INTERCONNECTED INTELLIGENT

More information

Outline of Hadoop. Background, Core Services, and Components. David Schwab Synchronic Analytics Nov.

Outline of Hadoop. Background, Core Services, and Components. David Schwab Synchronic Analytics   Nov. Outline of Hadoop Background, Core Services, and Components David Schwab Synchronic Analytics https://synchronicanalytics.com Nov. 1, 2018 Hadoop s Purpose and Origin Hadoop s Architecture Minimum Configuration

More information

Redefine Big Data: EMC Data Lake in Action. Andrea Prosperi Systems Engineer

Redefine Big Data: EMC Data Lake in Action. Andrea Prosperi Systems Engineer Redefine Big Data: EMC Data Lake in Action Andrea Prosperi Systems Engineer 1 Agenda Data Analytics Today Big data Hadoop & HDFS Different types of analytics Data lakes EMC Solutions for Data Lakes 2 The

More information

Oracle Big Data Cloud Service

Oracle Big Data Cloud Service Oracle Big Data Cloud Service Delivering Hadoop, Spark and Data Science with Oracle Security and Cloud Simplicity Oracle Big Data Cloud Service is an automated service that provides a highpowered environment

More information

DOWNTIME IS NOT AN OPTION

DOWNTIME IS NOT AN OPTION DOWNTIME IS NOT AN OPTION HOW APACHE MESOS AND DC/OS KEEPS APPS RUNNING DESPITE FAILURES AND UPDATES 2017 Mesosphere, Inc. All Rights Reserved. 1 WAIT, WHO ARE YOU? Engineer at Mesosphere DC/OS Contributor

More information

Modernizing Your Data Warehouse with Azure

Modernizing Your Data Warehouse with Azure Modernizing Your Data Warehouse with Azure Big data. Small data. All data. Christian Coté S P O N S O R S The traditional BI Environment The traditional data warehouse data warehousing has reached the

More information

Virtualizing Big Data/Hadoop Workloads. Update for vsphere 6. Justin Murray VMware VMware Inc. All rights reserved.

Virtualizing Big Data/Hadoop Workloads. Update for vsphere 6. Justin Murray VMware VMware Inc. All rights reserved. Virtualizing Big Data/Hadoop Workloads Update for vsphere 6 Justin Murray VMware 2014 VMware Inc. All rights reserved. Agenda The Hadoop Customer Journey Why Virtualize Hadoop? vsphere Big Data Extensions

More information

BIG DATA AND HADOOP DEVELOPER

BIG DATA AND HADOOP DEVELOPER BIG DATA AND HADOOP DEVELOPER Approximate Duration - 60 Hrs Classes + 30 hrs Lab work + 20 hrs Assessment = 110 Hrs + 50 hrs Project Total duration of course = 160 hrs Lesson 00 - Course Introduction 0.1

More information

HadoopWeb: MapReduce Platform for Big Data Analysis

HadoopWeb: MapReduce Platform for Big Data Analysis HadoopWeb: MapReduce Platform for Big Data Analysis Saloni Minocha 1, Jitender Kumar 2,s Hari Singh 3, Seema Bawa 4 1Student, Computer Science Department, N.C. College of Engineering, Israna, Panipat,

More information

PROCESSOR LEVEL RESOURCE-AWARE SCHEDULING FOR MAPREDUCE IN HADOOP

PROCESSOR LEVEL RESOURCE-AWARE SCHEDULING FOR MAPREDUCE IN HADOOP PROCESSOR LEVEL RESOURCE-AWARE SCHEDULING FOR MAPREDUCE IN HADOOP G.Hemalatha #1 and S.Shibia Malar *2 # P.G Student, Department of Computer Science, Thirumalai College of Engineering, Kanchipuram, India

More information

Special thanks to Chad Diaz II, Jason Montgomery & Micah Torres

Special thanks to Chad Diaz II, Jason Montgomery & Micah Torres Special thanks to Chad Diaz II, Jason Montgomery & Micah Torres Outline: What cloud computing is The history of cloud computing Cloud Services (Iaas, Paas, Saas) Cloud Computing Service Providers Technical

More information

SAS and Hadoop Technology: Overview

SAS and Hadoop Technology: Overview SAS and Hadoop Technology: Overview SAS Documentation September 19, 2017 The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2015. SAS and Hadoop Technology: Overview.

More information

Contents at a Glance COPYRIGHTED MATERIAL. Introduction... 1 Part I: Getting Started with Big Data... 7

Contents at a Glance COPYRIGHTED MATERIAL. Introduction... 1 Part I: Getting Started with Big Data... 7 Contents at a Glance Introduction... 1 Part I: Getting Started with Big Data... 7 Chapter 1: Grasping the Fundamentals of Big Data...9 Chapter 2: Examining Big Data Types...25 Chapter 3: Old Meets New:

More information

Integrating MATLAB Analytics into Enterprise Applications

Integrating MATLAB Analytics into Enterprise Applications Integrating MATLAB Analytics into Enterprise Applications David Willingham 2015 The MathWorks, Inc. 1 Run this link. http://bit.ly/matlabapp 2 Key Takeaways 1. What is Enterprise Integration 2. What is

More information

Scheduling Techniques for Workload Distribution in YARN Containers

Scheduling Techniques for Workload Distribution in YARN Containers Scheduling Techniques for Workload Distribution in YARN Containers 1 Rajneesh Kumar, 2 Dr.S.Govindarajan 1 Student, 2 Professor Department of Computer Application, Faculty of Engineering and Technology,

More information

Analytics Platform System

Analytics Platform System Analytics Platform System Big data. Small data. All data. Audie Wright, DW & Big Data Specialist Audie.Wright@Microsoft.com Ofc 425-538-0044, Cell 303-324-2860 Sean Mikha, DW & Big Data Architect semikha@microsoft.com

More information

MapR: Converged Data Pla3orm and Quick Start Solu;ons. Robin Fong Regional Director South East Asia

MapR: Converged Data Pla3orm and Quick Start Solu;ons. Robin Fong Regional Director South East Asia MapR: Converged Data Pla3orm and Quick Start Solu;ons Robin Fong Regional Director South East Asia Who is MapR? MapR is the creator of the top ranked Hadoop NoSQL SQL-on-Hadoop Real Database time streaming

More information

The Sysprog s Guide to the Customer Facing Mainframe: Cloud / Mobile / Social / Big Data

The Sysprog s Guide to the Customer Facing Mainframe: Cloud / Mobile / Social / Big Data Glenn Anderson, IBM Lab Services and Training The Sysprog s Guide to the Customer Facing Mainframe: Cloud / Mobile / Social / Big Data Summer SHARE August 2015 Session 17794 2 (c) Copyright 2015 IBM Corporation

More information

Top six performance challenges in managing microservices in a hybrid cloud

Top six performance challenges in managing microservices in a hybrid cloud Top six performance challenges in managing microservices in a hybrid cloud Table of Contents Top six performance challenges in managing microservices in a hybrid cloud Introduction... 3 Chapter 1: Managing

More information

Accelerating Your Big Data Analytics. Jeff Healey, Director Product Marketing, HPE Vertica

Accelerating Your Big Data Analytics. Jeff Healey, Director Product Marketing, HPE Vertica Accelerating Your Big Data Analytics Jeff Healey, Director Product Marketing, HPE Vertica Recent Waves of Disruption IT Infrastructu re for Analytics Data Warehouse Modernization Big Data/ Hadoop Cloud

More information

IBM Db2 Warehouse. Hybrid data warehousing using a software-defined environment in a private cloud. The evolution of the data warehouse

IBM Db2 Warehouse. Hybrid data warehousing using a software-defined environment in a private cloud. The evolution of the data warehouse IBM Db2 Warehouse Hybrid data warehousing using a software-defined environment in a private cloud The evolution of the data warehouse Managing a large-scale, on-premises data warehouse environments to

More information

Rapid Start with Big Data Appliance X6-2 Technical & Operational Overview

Rapid Start with Big Data Appliance X6-2 Technical & Operational Overview Rapid Start with Big Data Appliance X6-2 Technical & Operational Overview Dirk Augustin Solution Architect Hardware Presales Germany The Realities of Today s Data Center... Accelerating Customer Expectations

More information

Bringing the Power of SAS to Hadoop Title

Bringing the Power of SAS to Hadoop Title WHITE PAPER Bringing the Power of SAS to Hadoop Title Combine SAS World-Class Analytics With Hadoop s Low-Cost, Distributed Data Storage to Uncover Hidden Opportunities ii Contents Introduction... 1 What

More information

On Cloud Computational Models and the Heterogeneity Challenge

On Cloud Computational Models and the Heterogeneity Challenge On Cloud Computational Models and the Heterogeneity Challenge Raouf Boutaba D. Cheriton School of Computer Science University of Waterloo WCU IT Convergence Engineering Division POSTECH FOME, December

More information

Business Intelligence, 4e (Sharda/Delen/Turban) Chapter 1 An Overview of Business Intelligence, Analytics, and Data Science

Business Intelligence, 4e (Sharda/Delen/Turban) Chapter 1 An Overview of Business Intelligence, Analytics, and Data Science Business Intelligence, 4e (Sharda/Delen/Turban) Chapter 1 An Overview of Business Intelligence, Analytics, and Data Science 1) Computerized support is only used for organizational decisions that are responses

More information

Research on Intelligent Management Unified Service Platform of Internet Plus Agriculture

Research on Intelligent Management Unified Service Platform of Internet Plus Agriculture 2018 International Conference on Big Data and Artificial Intelligence (ICBDAI 2018) Research on Intelligent Management Unified Service Platform of Internet Plus Agriculture Liu Xiaogang Zhejiang Institute

More information

Exalogic Elastic Cloud

Exalogic Elastic Cloud Exalogic Elastic Cloud Mike Piech Oracle San Francisco Keywords: Exalogic Cloud WebLogic Coherence JRockit HotSpot Solaris Linux InfiniBand Introduction For most enterprise IT organizations, years of innovation,

More information

The Evolution of Data Protection

The Evolution of Data Protection The Evolution of Data Protection White Paper, May 2016 The Evolution of Data Protection Executive Summary The last ~25 years have seen a significant evolution of IT applications driven by new socio-economic

More information

Design of Uniform Infrastructure and Unlimited Scalability Solutions for Organization Using SIaaS Framework

Design of Uniform Infrastructure and Unlimited Scalability Solutions for Organization Using SIaaS Framework Design of Uniform Infrastructure and Unlimited Scalability Solutions for Organization Using SIaaS Framework V.S.Ramya Sudha *1, Mr.V.Dilip Kumar *2 M.Tech Student, Dept of CSE, S.R.K.R engineering college,

More information

Datametica. The Modern Data Platform Enterprise Data Hub Implementations. Why is workload moving to Cloud

Datametica. The Modern Data Platform Enterprise Data Hub Implementations. Why is workload moving to Cloud Datametica The Modern Data Platform Enterprise Data Hub Implementations Why is workload moving to Cloud 1 What we used do Enterprise Data Hub & Analytics What is Changing Why it is Changing Enterprise

More information

White paper A Reference Model for High Performance Data Analytics(HPDA) using an HPC infrastructure

White paper A Reference Model for High Performance Data Analytics(HPDA) using an HPC infrastructure White paper A Reference Model for High Performance Data Analytics(HPDA) using an HPC infrastructure Discover how to reshape an existing HPC infrastructure to run High Performance Data Analytics (HPDA)

More information

High-Performance Computing (HPC) Up-close

High-Performance Computing (HPC) Up-close High-Performance Computing (HPC) Up-close What It Can Do For You In this InfoBrief, we examine what High-Performance Computing is, how industry is benefiting, why it equips business for the future, what

More information

Architecture Overview for Data Analytics Deployments

Architecture Overview for Data Analytics Deployments Architecture Overview for Data Analytics Deployments Mahmoud Ghanem Sr. Systems Engineer GLOBAL SPONSORS Agenda The Big Picture Top Use Cases for Data Analytics Modern Architecture Concepts for Data Analytics

More information

Aurélie Pericchi SSP APS Laurent Marzouk Data Insight & Cloud Architect

Aurélie Pericchi SSP APS Laurent Marzouk Data Insight & Cloud Architect Aurélie Pericchi SSP APS Laurent Marzouk Data Insight & Cloud Architect 2005 Concert de Coldplay 2014 Concert de Coldplay 90% of the world s data has been created over the last two years alone 1 1. Source

More information

EXAMPLE SOLUTIONS Hadoop in Azure HBase as a columnar NoSQL transactional database running on Azure Blobs Storm as a streaming service for near real time processing Hadoop 2.4 support for 100x query gains

More information

Discovering the Scope of Mobile Agent Technology in Cloud Computing Environment: A Study

Discovering the Scope of Mobile Agent Technology in Cloud Computing Environment: A Study Discovering the Scope of Mobile Agent Technology in Cloud Computing Environment: A Study Mrs.Snehal A.Narale Abstract- The Cloud Computing has come into spectacle as a new computing archetype. It proposed

More information

Berkeley Data Analytics Stack (BDAS) Overview

Berkeley Data Analytics Stack (BDAS) Overview Berkeley Analytics Stack (BDAS) Overview Ion Stoica UC Berkeley UC BERKELEY What is Big used For? Reports, e.g., - Track business processes, transactions Diagnosis, e.g., - Why is user engagement dropping?

More information

IBM Analytics Unleash the power of data with Apache Spark

IBM Analytics Unleash the power of data with Apache Spark IBM Analytics Unleash the power of data with Apache Spark Agility, speed and simplicity define the analytics operating system of the future 1 2 3 4 Use Spark to create value from data-driven insights Lower

More information

Stateful Services on DC/OS. Santa Clara, California April 23th 25th, 2018

Stateful Services on DC/OS. Santa Clara, California April 23th 25th, 2018 Stateful Services on DC/OS Santa Clara, California April 23th 25th, 2018 Who Am I? Shafique Hassan Solutions Architect @ Mesosphere Operator 2 Agenda DC/OS Introduction and Recap Why Stateful Services

More information

Spark and Hadoop Perfect Together

Spark and Hadoop Perfect Together Spark and Hadoop Perfect Together Arun Murthy Hortonworks Co-Founder @acmurthy Data Operating System Enable all data and applications TO BE accessible and shared BY any end-users Data Operating System

More information

MapR Pentaho Business Solutions

MapR Pentaho Business Solutions MapR Pentaho Business Solutions The Benefits of a Converged Platform to Big Data Integration Tom Scurlock Director, WW Alliances and Partners, MapR Key Takeaways 1. We focus on business values and business

More information

Top 5 Challenges for Hadoop MapReduce in the Enterprise. Whitepaper - May /9/11

Top 5 Challenges for Hadoop MapReduce in the Enterprise. Whitepaper - May /9/11 Top 5 Challenges for Hadoop MapReduce in the Enterprise Whitepaper - May 2011 http://platform.com/mapreduce 2 5/9/11 Table of Contents Introduction... 2 Current Market Conditions and Drivers. Customer

More information

By: Shrikant Gawande (Cloudera Certified )

By: Shrikant Gawande (Cloudera Certified ) By: Shrikant Gawande (Cloudera Certified ) What is Big Data? For every 30 mins, a airline jet collects 10 terabytes of sensor data (flying time) NYSE generates about one terabyte of new trade data per

More information

Fast Innovation requires Fast IT

Fast Innovation requires Fast IT Fast Innovation requires Fast IT Trends and Data Centre Innovation Sudheesh Subhash Principal Solutions Architect Agenda Application trends Current data centre trends IT Cloud integration Automation and

More information

Big data using cloud computing

Big data using cloud computing Big data using cloud computing Bernice M. Purcell Holy Family University ABSTRACT Big Data is a data analysis methodology enabled by recent advances in technologies and architecture. However, big data

More information

Combine Microservices Framework for Flexible, Scalable, High Availability Big Data Analytics

Combine Microservices Framework for Flexible, Scalable, High Availability Big Data Analytics Combine Microservices Framework for Flexible, Scalable, High Availability Big Data Analytics Dan Widdis, Principal Operations Research Analyst May 10, 2016 Approved for public release; distribution is

More information

Realising Value from Data

Realising Value from Data Realising Value from Data Togetherwith Open Source Drives Innovation & Adoption in Big Data BCS Open Source SIG London 1 May 2013 Timings 6:00-6:30pm. Register / Refreshments 6:30-8:00pm, Presentation

More information

Data Analytics and CERN IT Hadoop Service. CERN openlab Technical Workshop CERN, December 2016 Luca Canali, IT-DB

Data Analytics and CERN IT Hadoop Service. CERN openlab Technical Workshop CERN, December 2016 Luca Canali, IT-DB Data Analytics and CERN IT Hadoop Service CERN openlab Technical Workshop CERN, December 2016 Luca Canali, IT-DB 1 Data Analytics at Scale The Challenge When you cannot fit your workload in a desktop Data

More information

Augmented Real-time Clinical DataMart. Phani S Srinivasan Ponnapalli, Syneos Health Subrahmanyam Rayaprolu, Syneos Health

Augmented Real-time Clinical DataMart. Phani S Srinivasan Ponnapalli, Syneos Health Subrahmanyam Rayaprolu, Syneos Health Augmented Real-time Clinical DataMart Phani S Srinivasan Ponnapalli, Syneos Health Subrahmanyam Rayaprolu, Syneos Health Agenda Introduction Traditional Clinical Data warehouse vs Digital Data Modern Data

More information

Data Analytics for Semiconductor Manufacturing The MathWorks, Inc. 1

Data Analytics for Semiconductor Manufacturing The MathWorks, Inc. 1 Data Analytics for Semiconductor Manufacturing 2016 The MathWorks, Inc. 1 Competitive Advantage What do we mean by Data Analytics? Analytics uses data to drive decision making, rather than gut feel or

More information

Research on the Framework and Data Fusion of an Energy Big-data Platform

Research on the Framework and Data Fusion of an Energy Big-data Platform 1 Paper Number: 17PESGM2652 Panel: Big data for Integrated Energy Systems Research on the Framework and Data Fusion of an Energy Big-data Platform Gengfeng Li, Zhaohong Bie, Jiang Wu, Cheng Li gengfengli@xjtu.edu.cn

More information

Operational Hadoop and the Lambda Architecture for Streaming Data

Operational Hadoop and the Lambda Architecture for Streaming Data Operational Hadoop and the Lambda Architecture for Streaming Data 2015 MapR Technologies 2015 MapR Technologies 1 Topics From Batch to Operational Workloads on Hadoop Streaming Data Environments The Lambda

More information

Design of material management system of mining group based on Hadoop

Design of material management system of mining group based on Hadoop IOP Conference Series: Earth and Environmental Science PAPER OPEN ACCESS Design of material system of mining group based on Hadoop To cite this article: Zhiyuan Xia et al 2018 IOP Conf. Ser.: Earth Environ.

More information

Sunnie Chung. Cleveland State University

Sunnie Chung. Cleveland State University Sunnie Chung Cleveland State University Data Scientist Big Data Processing Data Mining 2 INTERSECT of Computer Scientists and Statisticians with Knowledge of Data Mining AND Big data Processing Skills:

More information

In-Memory Analytics: Get Faster, Better Insights from Big Data

In-Memory Analytics: Get Faster, Better Insights from Big Data Discussion Summary In-Memory Analytics: Get Faster, Better Insights from Big Data January 2015 Interview Featuring: Tapan Patel, SAS Institute, Inc. Introduction A successful analytics program should translate

More information

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme VIRT1400BU Real-World Customer Architecture for Big Data on VMware vsphere Joe Bruneau, General Mills Justin Murray, Technical Marketing, VMware #VMworld #VIRT1400BU Disclaimer This presentation may contain

More information

Welcome to. enterprise-class big data and financial a. Putting big data and advanced analytics to work in financial services.

Welcome to. enterprise-class big data and financial a. Putting big data and advanced analytics to work in financial services. Welcome to enterprise-class big data and financial a Putting big data and advanced analytics to work in financial services. MapR-FSI Martin Darling We reinvented the data platform for next-gen intelligent

More information

BIG DATA TRANSFORMS BUSINESS. Copyright 2013 EMC Corporation. All rights reserved.

BIG DATA TRANSFORMS BUSINESS. Copyright 2013 EMC Corporation. All rights reserved. BIG DATA TRANSFORMS BUSINESS 1 Big Data = Structured+Unstructured Data Internet Of Things Non-Enterprise Information Structured Information In Relational Databases Managed & Unmanaged Unstructured Information

More information

http://azure123.rocks/ Agenda Why use the cloud to build apps? Virtual machines for lift-shift scenarios Microservices and Azure Service Fabric Data services in Azure DevOps solutions Compute Compute

More information

More information for FREE VS ENTERPRISE LICENCE :

More information for FREE VS ENTERPRISE LICENCE : Source : http://www.splunk.com/ Splunk Enterprise is a fully featured, powerful platform for collecting, searching, monitoring and analyzing machine data. Splunk Enterprise is easy to deploy and use. It

More information

DataAdapt Active Insight

DataAdapt Active Insight Solution Highlights Accelerated time to value Enterprise-ready Apache Hadoop based platform for data processing, warehousing and analytics Advanced analytics for structured, semistructured and unstructured

More information

Cloudera Enterprise Data Hub Reference Architecture for Oracle Cloud Infrastructure Deployments O R A C L E W H I T E P A P E R J U N E

Cloudera Enterprise Data Hub Reference Architecture for Oracle Cloud Infrastructure Deployments O R A C L E W H I T E P A P E R J U N E Cloudera Enterprise Data Hub Reference Architecture for Oracle Cloud Infrastructure Deployments O R A C L E W H I T E P A P E R J U N E 2 0 1 8 Disclaimer The following is intended to outline our general

More information

Analytics in Action transforming the way we use and consume information

Analytics in Action transforming the way we use and consume information Analytics in Action transforming the way we use and consume information Big Data Ecosystem The Data Traditional Data BIG DATA Repositories MPP Appliances Internet Hadoop Data Streaming Big Data Ecosystem

More information

PRIORITY BASED SCHEDULING IN CLOUD COMPUTING BASED ON TASK AWARE TECHNIQUE

PRIORITY BASED SCHEDULING IN CLOUD COMPUTING BASED ON TASK AWARE TECHNIQUE RESEARCH ARTICLE OPEN ACCESS PRIORITY BASED SCHEDULING IN CLOUD COMPUTING BASED ON TASK AWARE TECHNIQUE Jeevithra.R *, Karthikeyan.T ** * (M.Phil Computer Science Student Department of Computer Science)

More information

EXECUTIVE BRIEF. Successful Data Warehouse Approaches to Meet Today s Analytics Demands. In this Paper

EXECUTIVE BRIEF. Successful Data Warehouse Approaches to Meet Today s Analytics Demands. In this Paper Sponsored by Successful Data Warehouse Approaches to Meet Today s Analytics Demands EXECUTIVE BRIEF In this Paper Organizations are adopting increasingly sophisticated analytics methods Analytics usage

More information

Cloud Computing Lecture 3

Cloud Computing Lecture 3 Cloud Computing Lecture 3 1/17/2012 Agenda IaaS PaaS SaaS Identity as a service Compliance as a service Identify service model AnyPresence Co-Founder, CEO: Anirban "AC" Chakrabarti AnyPresence s cloud-based

More information

An Oracle White Paper January Upgrade to Oracle Netra T4 Systems to Improve Service Delivery and Reduce Costs

An Oracle White Paper January Upgrade to Oracle Netra T4 Systems to Improve Service Delivery and Reduce Costs An Oracle White Paper January 2013 Upgrade to Oracle Netra T4 Systems to Improve Service Delivery and Reduce Costs Executive Summary... 2 Deploy Services Faster and More Efficiently... 3 Greater Compute

More information

Meetup DB2 LUW - Madrid. IBM dashdb. Raquel Cadierno Torre IBM 1 de Julio de IBM Corporation

Meetup DB2 LUW - Madrid. IBM dashdb. Raquel Cadierno Torre IBM 1 de Julio de IBM Corporation IBM dashdb Raquel Cadierno Torre IBM Analytics @IBMAnalytics rcadierno@es.ibm.com 1 de Julio de 2016 1 2016 IBM Corporation What is dashdb? http://www.ibm.com/analytics/us/en/technology/cloud-data-services/dashdb/

More information

IBM Spectrum Scale. Advanced storage management of unstructured data for cloud, big data, analytics, objects and more. Highlights

IBM Spectrum Scale. Advanced storage management of unstructured data for cloud, big data, analytics, objects and more. Highlights IBM Spectrum Scale Advanced storage management of unstructured data for cloud, big data, analytics, objects and more Highlights Consolidate storage across traditional file and new-era workloads for object,

More information

How In-Memory Computing can Maximize the Performance of Modern Payments

How In-Memory Computing can Maximize the Performance of Modern Payments How In-Memory Computing can Maximize the Performance of Modern Payments 2018 The mobile payments market is expected to grow to over a trillion dollars by 2019 How can in-memory computing maximize the performance

More information

Cloud Platforms. Various types and their properties. Prof. Balwinder Sodhi. 1 Computer Science and Engineering, IIT Ropar

Cloud Platforms. Various types and their properties. Prof. Balwinder Sodhi. 1 Computer Science and Engineering, IIT Ropar Cloud Platforms Various types and their properties Prof. Balwinder Sodhi 1 Computer Science and Engineering, IIT Ropar Cloud Classification Service model based Depends on the cloud services being offered

More information

Evolving Your Infrastructure to Cloud

Evolving Your Infrastructure to Cloud Evolving Your Infrastructure to Cloud Creating Your Cloud Strategy for Enterprise Applications Jim Gargan SVP Cloud Infrastructure Group January 19, 2017 Cloud Is Impacting Every Industry, Every Geography

More information

Sr. Sergio Rodríguez de Guzmán CTO PUE

Sr. Sergio Rodríguez de Guzmán CTO PUE PRODUCT LATEST NEWS Sr. Sergio Rodríguez de Guzmán CTO PUE www.pue.es Hadoop & Why Cloudera Sergio Rodríguez Systems Engineer sergio@pue.es 3 Industry-Leading Consulting and Training PUE is the first Spanish

More information

Multi-Agent Model for Job Scheduling in Cloud Computing

Multi-Agent Model for Job Scheduling in Cloud Computing Multi-Agent Model for Job Scheduling in Cloud Computing Khaled M. Khalil, M. Abdel-Aziz, Taymour T. Nazmy, Abdel-Badeeh M. Salem Abstract Many applications are turning to Cloud Computing to meet their

More information

H2O Powers Intelligent Product Recommendation Engine at Transamerica. Case Study

H2O Powers Intelligent Product Recommendation Engine at Transamerica. Case Study H2O Powers Intelligent Product Recommendation Engine at Transamerica Case Study Summary For a financial services firm like Transamerica, sales and marketing efforts can be complex and challenging, with

More information

Business is being transformed by three trends

Business is being transformed by three trends Business is being transformed by three trends Big Cloud Intelligence Stay ahead of the curve with Cortana Intelligence Suite Business apps People Custom apps Apps Sensors and devices Cortana Intelligence

More information