Adopting Big Data Technologies in the Support of Official Statistical Production: Opportunities, Experiences and Lessons Learned

Size: px
Start display at page:

Download "Adopting Big Data Technologies in the Support of Official Statistical Production: Opportunities, Experiences and Lessons Learned"

Transcription

1 Adopting Big Data Technologies in the Support of Official Statistical Production: Opportunities, Experiences and Lessons Learned Antonino Virgillito Istat Istituto Nazionale di Statistica 31/07/2017 1

2 Introduction The use of Big Data sources in the context of production of official statistics has been at the core of several initiatives at both national and international level in recent years Among all the questions that were raised by the use of Big Data for statistics, a specific one is the use of the novel IT tools that are available for handling Big Data In this presentation we give an overview of Big Data technology and presents the different ways in which it can be used to support statistical production, analysing the experiences made in Istat and at international level

3 Overview of Big Data Tools Big Data Tool Specifically designed to cope with aspects such as large size of data and loose structure What are the characteristics of these tools? How can they be used in the context of statistical production?

4 Categories of Big Data Tools Distributed Computing Platforms Clusters of interconnected machines working as a whole in order to store and process data Hadoop Spark

5 Categories of Big Data Tools Massively Parallel Processing DB Provide real-time querying with fast response times over large data sets Dremel Drill Impala

6 Categories of Big Data Tools NoSQL Databases Not based on the traditional tabular data model but capable of handling non-structured data HBase MongoDB Elasticsearch

7 Uses of Big Data Tools RDBMS Offload Big Data staging Experiments Heavy processing

8 Case Study: UNECE Sandbox Web-based collaborative environment, hosted in Ireland by ICHEC (Irish Centre for High-End Computing) UNECE Big Data Project ( ) High-Level Group for the Modernisation of Official Statistics (HLG-MOS) Objective: to better understand how to use the power of Big Data to support the production of official statistics Both tools and data sources available Currently used as Training platform for the ESTP courses on Big Data Shared test environment for the ESS Big Data project promoted by Eurostat

9 Case Study: Istat Big Data Platform In-house Big Data platform 8-node Hadoop Cluster + Spark + MPP DB Designed for use in both production and experimental projects Scanner data Population estimates with mobile phone data Completely hosted on-premise and managed by internal staff Motivated by privacy constraint over datasets Costly solution evaluate cloud when possible

10 Lessons Learned: The Problem with Size Volume is the dimension having highest practical impact in real Big Data projects Datasets in the order of Tb of size, constantly growing No absolute threshold over which data becomes big strictly related to the kind of processing that they should be subject to Real datasets are smaller than what many companies are used to treat daily, but not common for an NSI

11 Lessons Learned: IT Tools as Enablers of Advanced Analytics Big data tools enable operations and analysis that are not possible with standard tools Or, they can provide better performance with respect to traditional tools when dealing with large data sets Example of execution times of real operations Istat Scanner Data project Big data tools vs RDBMS

12 Lessons Learned: Skills and Roles The relationship between IT and statisticians within a statistical organization is a particularly critical issue when working with Big Data How to exploit the potential of technology without compromising the autonomy of the research? Complete separation of concerns as commonly experienced in NSIs, may lead to inefficiencies. Capacity building is crucial New mentality needed! Mix of competences, collaborative approach

13 Conclusions The rising of Big Data created new problems for statistical organizations, that lie at the intersection of statistical analysis and IT Technology is now mature enough to provide us with a huge potential for easily and help organizations to give the answers to new and more complex questions A paradigm shift is needed in the approach to those aspects of the statistical business that need to exploit new technologies More cohesive and mixed approach between IT and statisticians No matter how big the datasets we will manage, this is only a part of the general problem of how statistical organizations would initiate the transition from traditional methods to the modern challenges of data science.

Experiences in the Use of Big Data for Official Statistics

Experiences in the Use of Big Data for Official Statistics Think Big - Data innovation in Latin America Santiago, Chile 6 th March 2017 Experiences in the Use of Big Data for Official Statistics Antonino Virgillito Istat Introduction The use of Big Data sources

More information

DataAdapt Active Insight

DataAdapt Active Insight Solution Highlights Accelerated time to value Enterprise-ready Apache Hadoop based platform for data processing, warehousing and analytics Advanced analytics for structured, semistructured and unstructured

More information

Statistics & Optimization with Big Data

Statistics & Optimization with Big Data Statistics & Optimization with Big Data Technology and data driven decision science company focused on helping academics to solve big data and analytics problems of any kind, from any source, at massive

More information

Hadoop and Analytics at CERN IT CERN IT-DB

Hadoop and Analytics at CERN IT CERN IT-DB Hadoop and Analytics at CERN IT CERN IT-DB 1 Hadoop Use cases Parallel processing of large amounts of data Perform analytics on a large scale Dealing with complex data: structured, semi-structured, unstructured

More information

Introduction to Big Data(Hadoop) Eco-System The Modern Data Platform for Innovation and Business Transformation

Introduction to Big Data(Hadoop) Eco-System The Modern Data Platform for Innovation and Business Transformation Introduction to Big Data(Hadoop) Eco-System The Modern Data Platform for Innovation and Business Transformation Roger Ding Cloudera February 3rd, 2018 1 Agenda Hadoop History Introduction to Apache Hadoop

More information

Session 30 Powerful Ways to Use Hadoop in your Healthcare Big Data Strategy

Session 30 Powerful Ways to Use Hadoop in your Healthcare Big Data Strategy Session 30 Powerful Ways to Use Hadoop in your Healthcare Big Data Strategy Bryan Hinton Senior Vice President, Platform Engineering Health Catalyst Sean Stohl Senior Vice President, Product Development

More information

Big Data & Hadoop Advance

Big Data & Hadoop Advance Course Durations: 30 Hours About Company: Course Mode: Online/Offline EduNextgen extended arm of Product Innovation Academy is a growing entity in education and career transformation, specializing in today

More information

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics ESSENTIALS EMC ISILON Use the industry's first and only scale-out NAS solution with native Hadoop

More information

Data Analytics. Nagesh Madhwal Client Solutions Director, Consulting, Southeast Asia, Dell EMC

Data Analytics. Nagesh Madhwal Client Solutions Director, Consulting, Southeast Asia, Dell EMC Data Analytics Nagesh Madhwal Client Solutions Director, Consulting, Southeast Asia, Dell EMC Last 15 years IT-centric Traditional Analytics Traditional Applications Rigid Infrastructure Internet Next

More information

5th Annual. Cloudera, Inc. All rights reserved.

5th Annual. Cloudera, Inc. All rights reserved. 5th Annual 1 The Essentials of Apache Hadoop The What, Why and How to Meet Agency Objectives Sarah Sproehnle, Vice President, Customer Success 2 Introduction 3 What is Apache Hadoop? Hadoop is a software

More information

Sunnie Chung. Cleveland State University

Sunnie Chung. Cleveland State University Sunnie Chung Cleveland State University Data Scientist Big Data Processing Data Mining 2 INTERSECT of Computer Scientists and Statisticians with Knowledge of Data Mining AND Big data Processing Skills:

More information

BIG DATA PROCESSING A DEEP DIVE IN HADOOP/SPARK & AZURE SQL DW

BIG DATA PROCESSING A DEEP DIVE IN HADOOP/SPARK & AZURE SQL DW BIG DATA PROCESSING A DEEP DIVE IN HADOOP/SPARK & AZURE SQL DW TOPICS COVERED 1 2 Fundamentals of Big Data Platforms Major Big Data Tools Scaling Up vs. Out SCALE UP (SMP) SCALE OUT (MPP) + (n) Upgrade

More information

New Big Data Solutions and Opportunities for DB Workloads

New Big Data Solutions and Opportunities for DB Workloads New Big Data Solutions and Opportunities for DB Workloads Hadoop and Spark Ecosystem for Data Analytics, Experience and Outlook Luca Canali, IT-DB Hadoop and Spark Service WLCG, GDB meeting CERN, September

More information

Microsoft Azure Essentials

Microsoft Azure Essentials Microsoft Azure Essentials Azure Essentials Track Summary Data Analytics Explore the Data Analytics services in Azure to help you analyze both structured and unstructured data. Azure can help with large,

More information

Cognitive Data Warehouse and Analytics

Cognitive Data Warehouse and Analytics Cognitive Data Warehouse and Analytics Hemant R. Suri, Sr. Offering Manager, Hybrid Data Warehouses, IBM (twitter @hemantrsuri or feel free to reach out to me via LinkedIN!) Over 90% of the world s data

More information

EXECUTIVE BRIEF. Successful Data Warehouse Approaches to Meet Today s Analytics Demands. In this Paper

EXECUTIVE BRIEF. Successful Data Warehouse Approaches to Meet Today s Analytics Demands. In this Paper Sponsored by Successful Data Warehouse Approaches to Meet Today s Analytics Demands EXECUTIVE BRIEF In this Paper Organizations are adopting increasingly sophisticated analytics methods Analytics usage

More information

Data Analytics and CERN IT Hadoop Service. CERN openlab Technical Workshop CERN, December 2016 Luca Canali, IT-DB

Data Analytics and CERN IT Hadoop Service. CERN openlab Technical Workshop CERN, December 2016 Luca Canali, IT-DB Data Analytics and CERN IT Hadoop Service CERN openlab Technical Workshop CERN, December 2016 Luca Canali, IT-DB 1 Data Analytics at Scale The Challenge When you cannot fit your workload in a desktop Data

More information

E-guide Hadoop Big Data Platforms Buyer s Guide part 1

E-guide Hadoop Big Data Platforms Buyer s Guide part 1 Hadoop Big Data Platforms Buyer s Guide part 1 Your expert guide to Hadoop big data platforms for managing big data David Loshin, Knowledge Integrity Inc. Companies of all sizes can use Hadoop, as vendors

More information

Bringing the Power of SAS to Hadoop Title

Bringing the Power of SAS to Hadoop Title WHITE PAPER Bringing the Power of SAS to Hadoop Title Combine SAS World-Class Analytics With Hadoop s Low-Cost, Distributed Data Storage to Uncover Hidden Opportunities ii Contents Introduction... 1 What

More information

Accelerating Your Big Data Analytics. Jeff Healey, Director Product Marketing, HPE Vertica

Accelerating Your Big Data Analytics. Jeff Healey, Director Product Marketing, HPE Vertica Accelerating Your Big Data Analytics Jeff Healey, Director Product Marketing, HPE Vertica Recent Waves of Disruption IT Infrastructu re for Analytics Data Warehouse Modernization Big Data/ Hadoop Cloud

More information

Building Your Big Data Team

Building Your Big Data Team Building Your Big Data Team With all the buzz around Big Data, many companies have decided they need some sort of Big Data initiative in place to stay current with modern data management requirements.

More information

High-Level Group for the Modernisation of Statistical Production and Services

High-Level Group for the Modernisation of Statistical Production and Services United Nations Economic Commission for Europe Statistical Division High-Level Group for the Modernisation of Statistical Production and Services Steven Vale UNECE steven.vale@unece.org Matjaž Jug UNECE

More information

Oracle Big Data Cloud Service

Oracle Big Data Cloud Service Oracle Big Data Cloud Service Delivering Hadoop, Spark and Data Science with Oracle Security and Cloud Simplicity Oracle Big Data Cloud Service is an automated service that provides a highpowered environment

More information

Confidential

Confidential June 2017 1. Is your EDW becoming too expensive to maintain because of hardware upgrades and increasing data volumes? 2. Is your EDW becoming a monolith, which is too slow to adapt to business s analytical

More information

Optimal Infrastructure for Big Data

Optimal Infrastructure for Big Data Optimal Infrastructure for Big Data Big Data 2014 Managing Government Information Kevin Leong January 22, 2014 2014 VMware Inc. All rights reserved. The Right Big Data Tools for the Right Job Real-time

More information

AZURE HDINSIGHT. Azure Machine Learning Track Marek Chmel

AZURE HDINSIGHT. Azure Machine Learning Track Marek Chmel AZURE HDINSIGHT Azure Machine Learning Track Marek Chmel SESSION AGENDA Understanding different scenarios of Hadoop Building an end to end pipeline using HDInsight Using in-memory techniques to analyze

More information

ABOUT THIS TRAINING: This Hadoop training will also prepare you for the Big Data Certification of Cloudera- CCP and CCA.

ABOUT THIS TRAINING: This Hadoop training will also prepare you for the Big Data Certification of Cloudera- CCP and CCA. ABOUT THIS TRAINING: The world of Hadoop and Big Data" can be intimidating - hundreds of different technologies with cryptic names form the Hadoop ecosystem. This comprehensive training has been designed

More information

Cloudera Data Science and Machine Learning. Robin Harrison, Account Executive David Kemp, Systems Engineer. Cloudera, Inc. All rights reserved.

Cloudera Data Science and Machine Learning. Robin Harrison, Account Executive David Kemp, Systems Engineer. Cloudera, Inc. All rights reserved. Cloudera Data Science and Machine Learning Robin Harrison, Account Executive David Kemp, Systems Engineer 1 This is the age of machine learning. Data volume NO Machine Learning Machine Learning 1950s 1960s

More information

Hadoop fundamentals. Big Data Consulting. Robert Gibbon

Hadoop fundamentals. Big Data Consulting. Robert Gibbon Hadoop fundamentals Big Data Consulting Robert Gibbon Rob Gibbon Architect @Big Industries Belgium Focus on designing, deploying & integrating web scale solutions with Hadoop Deliveries for clients in

More information

TechValidate Survey Report. Converged Data Platform Key to Competitive Advantage

TechValidate Survey Report. Converged Data Platform Key to Competitive Advantage TechValidate Survey Report Converged Data Platform Key to Competitive Advantage TechValidate Survey Report Converged Data Platform Key to Competitive Advantage Executive Summary What Industry Analysts

More information

Sr. Sergio Rodríguez de Guzmán CTO PUE

Sr. Sergio Rodríguez de Guzmán CTO PUE PRODUCT LATEST NEWS Sr. Sergio Rodríguez de Guzmán CTO PUE www.pue.es Hadoop & Why Cloudera Sergio Rodríguez Systems Engineer sergio@pue.es 3 Industry-Leading Consulting and Training PUE is the first Spanish

More information

Guide to Modernize Your Enterprise Data Warehouse How to Migrate to a Hadoop-based Big Data Lake

Guide to Modernize Your Enterprise Data Warehouse How to Migrate to a Hadoop-based Big Data Lake White Paper Guide to Modernize Your Enterprise Data Warehouse How to Migrate to a Hadoop-based Big Data Lake Motivation for Modernization It is now a well-documented realization among Fortune 500 companies

More information

MapR: Solution for Customer Production Success

MapR: Solution for Customer Production Success 2015 MapR Technologies 2015 MapR Technologies 1 MapR: Solution for Customer Production Success Big Data High Growth 700+ Customers Cloud Leaders Riding the Wave with Hadoop The Big Data Platform of Choice

More information

Copyright - Diyotta, Inc. - All Rights Reserved. Page 2

Copyright - Diyotta, Inc. - All Rights Reserved. Page 2 Page 2 Page 3 Page 4 Page 5 Humanizing Analytics Analytic Solutions that Provide Powerful Insights about Today s Healthcare Consumer to Manage Risk and Enable Engagement and Activation Industry Alignment

More information

Meetup DB2 LUW - Madrid. IBM dashdb. Raquel Cadierno Torre IBM 1 de Julio de IBM Corporation

Meetup DB2 LUW - Madrid. IBM dashdb. Raquel Cadierno Torre IBM 1 de Julio de IBM Corporation IBM dashdb Raquel Cadierno Torre IBM Analytics @IBMAnalytics rcadierno@es.ibm.com 1 de Julio de 2016 1 2016 IBM Corporation What is dashdb? http://www.ibm.com/analytics/us/en/technology/cloud-data-services/dashdb/

More information

Common Customer Use Cases in FSI

Common Customer Use Cases in FSI Common Customer Use Cases in FSI 1 Marketing Optimization 2014 2014 MapR MapR Technologies Technologies 2 Fortune 100 Financial Services Company 104M CARD MEMBERS 3 Financial Services: Recommendation Engine

More information

Getting Started with Amazon QuickSight

Getting Started with Amazon QuickSight Getting Started with QuickSight Matt McClean Solutions Architect, AWS April 14, 2016 2016, Web Services, Inc. or its Affiliates. All rights reserved. What to Expect from the Session Overview of Big Data

More information

Operational Hadoop and the Lambda Architecture for Streaming Data

Operational Hadoop and the Lambda Architecture for Streaming Data Operational Hadoop and the Lambda Architecture for Streaming Data 2015 MapR Technologies 2015 MapR Technologies 1 Topics From Batch to Operational Workloads on Hadoop Streaming Data Environments The Lambda

More information

Six Critical Capabilities for a Big Data Analytics Platform

Six Critical Capabilities for a Big Data Analytics Platform White Paper Analytics & Big Data Six Critical Capabilities for a Big Data Analytics Platform Table of Contents page Executive Summary...1 Key Requirements for a Big Data Analytics Platform...1 Vertica:

More information

Datametica. The Modern Data Platform Enterprise Data Hub Implementations. Why is workload moving to Cloud

Datametica. The Modern Data Platform Enterprise Data Hub Implementations. Why is workload moving to Cloud Datametica The Modern Data Platform Enterprise Data Hub Implementations Why is workload moving to Cloud 1 What we used do Enterprise Data Hub & Analytics What is Changing Why it is Changing Enterprise

More information

GE Intelligent Platforms. Proficy Historian HD

GE Intelligent Platforms. Proficy Historian HD GE Intelligent Platforms Proficy Historian HD The Industrial Big Data Historian Industrial machines have always issued early warnings, but in an inconsistent way and in a language that people could not

More information

A cross-cutting project on Information Models and Standards

A cross-cutting project on Information Models and Standards Distr. GENERAL 29 April 2013 WP.5 ENGLISH ONLY UNITED NATIONS ECONOMIC COMMISSION FOR EUROPE CONFERENCE OF EUROPEAN STATISTICIANS EUROPEAN COMMISSION STATISTICAL OFFICE OF THE EUROPEAN UNION (EUROSTAT)

More information

Statistical Architecture Models

Statistical Architecture Models United Nations Economic Commission for Europe Statistical Division Statistical Architecture Models Steven Vale UNECE steven.vale@unece.org Introducing UNECE Statistics Introducing the HLG-MOS High-level

More information

China Center of Excellence

China Center of Excellence 1 China Center of Excellence Project Guardian ebay is a global company, projects within ebay normally require efforts and synergies from teams located in different cities of different countries. This is

More information

Why Big Data Matters? Speaker: Paras Doshi

Why Big Data Matters? Speaker: Paras Doshi Why Big Data Matters? Speaker: Paras Doshi If you re wondering about what is Big Data and why does it matter to you and your organization, then come to this talk and get introduced to Big Data and learn

More information

BIG DATA AND HADOOP DEVELOPER

BIG DATA AND HADOOP DEVELOPER BIG DATA AND HADOOP DEVELOPER Approximate Duration - 60 Hrs Classes + 30 hrs Lab work + 20 hrs Assessment = 110 Hrs + 50 hrs Project Total duration of course = 160 hrs Lesson 00 - Course Introduction 0.1

More information

EBOOK: Cloudwick Powering the Digital Enterprise

EBOOK: Cloudwick Powering the Digital Enterprise EBOOK: Cloudwick Powering the Digital Enterprise Contents What is a Data Lake?... Benefits of a Data Lake on AWS... Building a Data Lake on AWS... Cloudwick Case Study... About Cloudwick... Getting Started...

More information

DATA SCIENCE: HYPE AND REALITY PATRICK HALL

DATA SCIENCE: HYPE AND REALITY PATRICK HALL DATA SCIENCE: HYPE AND REALITY PATRICK HALL About me SAS Enterprise Miner, 2012 Cloudera Data Scientist, 2014 Do you use Kolmogorov Smirnov often? Statistician No, I mix my martinis with gin. Data Scientist

More information

Analytics for All Data

Analytics for All Data Analytics for All Data How Oracle Analytics Helps Agencies Improve Their Effectiveness FORCES 2017 Jim Penn Sr Manager, Public Sector Oracle Analytics & Big Data Agenda Oracle s Analytics Platform Overview

More information

ESSnet on Free and Open Source Software for Statistical Production

ESSnet on Free and Open Source Software for Statistical Production ESSnet on Free and Open Source Software for Statistical Production Project proposal 08.02.2013 Prepared by: Giulio Barcaroli (ISTAT), Duncan Elliot (ONS), Mark van der Loo (Statistics Netherlands), and

More information

with Dell EMC s On-Premises Solutions

with Dell EMC s On-Premises Solutions 902 Broadway, 7th Floor New York, NY 10010 www.theedison.com @EdisonGroupInc 212.367.7400 Lower the Cost of Analytics with Dell EMC s On-Premises Solutions Comparing Total Cost of Ownership of Dell EMC

More information

Leveraging Oracle Big Data Discovery to Master CERN s Data. Manuel Martín Márquez Oracle Business Analytics Innovation 12 October- Stockholm, Sweden

Leveraging Oracle Big Data Discovery to Master CERN s Data. Manuel Martín Márquez Oracle Business Analytics Innovation 12 October- Stockholm, Sweden Leveraging Oracle Big Data Discovery to Master CERN s Data Manuel Martín Márquez Oracle Business Analytics Innovation 12 October- Stockholm, Sweden Manuel Martin Marquez Intel IoT Ignition Lab Cloud and

More information

GPU ACCELERATED BIG DATA ARCHITECTURE

GPU ACCELERATED BIG DATA ARCHITECTURE INNOVATION PLATFORM WHITE PAPER 1 Today s enterprise is producing and consuming more data than ever before. Enterprise data storage and processing architectures have struggled to keep up with this exponentially

More information

IBM Db2 Warehouse. Hybrid data warehousing using a software-defined environment in a private cloud. The evolution of the data warehouse

IBM Db2 Warehouse. Hybrid data warehousing using a software-defined environment in a private cloud. The evolution of the data warehouse IBM Db2 Warehouse Hybrid data warehousing using a software-defined environment in a private cloud The evolution of the data warehouse Managing a large-scale, on-premises data warehouse environments to

More information

PNDA.io: when big data and OSS collide

PNDA.io: when big data and OSS collide .io: when big data and OSS collide Simplified OSS / BSS Stack [Build Slide] Order Customer Bills and Reports Order Mgmt BSS Billing and Reporting Orchestration is responsible for service provisioning and

More information

SAS & HADOOP ANALYTICS ON BIG DATA

SAS & HADOOP ANALYTICS ON BIG DATA SAS & HADOOP ANALYTICS ON BIG DATA WHY HADOOP? OPEN SOURCE MASSIVE SCALE FAST PROCESSING COMMODITY COMPUTING DATA REDUNDANCY DISTRIBUTED WHY HADOOP? Hadoop will soon become a replacement complement to:

More information

zdata Solutions BI / Advanced Analytic Platform and Pilot Programs

zdata Solutions BI / Advanced Analytic Platform and Pilot Programs zdata Solutions BI / Advanced Analytic Platform and Pilot Programs BI & Analytics Platform Store Gather, integrate, load and manage your data in the cloud or on premise Collaborate Validate and dimensionalize

More information

Building a Modern Data Warehouse in Azure for Power BI

Building a Modern Data Warehouse in Azure for Power BI Building a Modern Data Warehouse in Azure for Power BI About Us Phil Spokas email phil@intellitect.com Web Intellitect.com/phil Twitter Linked In @philspokas /in/philspokas web Twitter Intellitect.com

More information

Insights to HDInsight

Insights to HDInsight Insights to HDInsight Why Hadoop in the Cloud? No hardware costs Unlimited Scale Pay for What You Need Deployed in minutes Azure HDInsight Big Data made easy Enterprise Ready Easier and more productive

More information

Big Data Application Engineer/ Developer. Specialization in Apache Spark, Kafka, Airflow, HBase

Big Data Application Engineer/ Developer. Specialization in Apache Spark, Kafka, Airflow, HBase BIG DATA COURSE Big Data Application Engineer/ Developer Specialization in Apache Spark, Kafka, Airflow, HBase In Exclusive Association with 21,347+ Participants 10,000+ Brands 1200+ Trainings 45+ Countries

More information

Building a Robust Analytics Platform

Building a Robust Analytics Platform akass@ + dmi@ Building a Robust Analytics Platform with an open-source stack What s coming up: 1) DigitalOcean - a company background 2) Data @ DigitalOcean 3) The Big Data Tech Stack @ DO 4) Use-cases

More information

Business is being transformed by three trends

Business is being transformed by three trends Business is being transformed by three trends Big Cloud Intelligence Stay ahead of the curve with Cortana Intelligence Suite Business apps People Custom apps Apps Sensors and devices Cortana Intelligence

More information

Pool Data: 2/18/2018. Best Practices and Practical Considerations. Do you have the Moneyball Mindset at your pool?

Pool Data: 2/18/2018. Best Practices and Practical Considerations. Do you have the Moneyball Mindset at your pool? Pool Data: Best Practices and Practical Considerations RYAN DRAUGHN, DIRECTOR OF INFORMATION TECHNOLOGY NLC MUTUAL INSURANCE COMPANY 1 Do you have the Moneyball Mindset at your pool? 2 Agenda Leveraging

More information

Databricks Cloud. A Primer

Databricks Cloud. A Primer Databricks Cloud A Primer Who is Databricks? Databricks was founded by the team behind Apache Spark, the most active open source project in the big data ecosystem today. Our mission at Databricks is to

More information

Course 20467C: Designing Self-Service Business Intelligence and Big Data Solutions

Course 20467C: Designing Self-Service Business Intelligence and Big Data Solutions Course 20467C: Designing Self-Service Business Intelligence and Big Data Solutions Length : 5 Days Audience(s) : IT Professionals Level : 300 Technology : Microsoft SQL Server 2014 Delivery Method : Instructor-led

More information

Data Analytics Use Cases, Platforms, Services. ITMM, March 5 th, 2018 Luca Canali, IT-DB

Data Analytics Use Cases, Platforms, Services. ITMM, March 5 th, 2018 Luca Canali, IT-DB Data Analytics Use Cases, Platforms, Services ITMM, March 5 th, 2018 Luca Canali, IT-DB 1 Analytics and Big Data Pipelines Use Cases Many use cases at CERN for analytics Data analysis, dashboards, plots,

More information

Hadoop Course Content

Hadoop Course Content Hadoop Course Content Hadoop Course Content Hadoop Overview, Architecture Considerations, Infrastructure, Platforms and Automation Use case walkthrough ETL Log Analytics Real Time Analytics Hbase for Developers

More information

Analytics in Action transforming the way we use and consume information

Analytics in Action transforming the way we use and consume information Analytics in Action transforming the way we use and consume information Big Data Ecosystem The Data Traditional Data BIG DATA Repositories MPP Appliances Internet Hadoop Data Streaming Big Data Ecosystem

More information

Nouvelle Génération de l infrastructure Data Warehouse et d Analyses

Nouvelle Génération de l infrastructure Data Warehouse et d Analyses Nouvelle Génération de l infrastructure Data Warehouse et d Analyses November 2011 André Münger andre.muenger@emc.com +41 79 708 85 99 1 Agenda BIG Data Challenges Greenplum Overview Use Cases Summary

More information

MapR Pentaho Business Solutions

MapR Pentaho Business Solutions MapR Pentaho Business Solutions The Benefits of a Converged Platform to Big Data Integration Tom Scurlock Director, WW Alliances and Partners, MapR Key Takeaways 1. We focus on business values and business

More information

Big Data Trends Arató Bence. BI Consulting

Big Data Trends Arató Bence. BI Consulting Big Data Trends 2017 Arató Bence BI Consulting arato@biconsulting.hu 1 Introduction Arató Bence Consulting and Advisory BI/DW/Big Data strategy, Architecture planning, vendor and tool selection. Also provides

More information

Outline of Hadoop. Background, Core Services, and Components. David Schwab Synchronic Analytics Nov.

Outline of Hadoop. Background, Core Services, and Components. David Schwab Synchronic Analytics   Nov. Outline of Hadoop Background, Core Services, and Components David Schwab Synchronic Analytics https://synchronicanalytics.com Nov. 1, 2018 Hadoop s Purpose and Origin Hadoop s Architecture Minimum Configuration

More information

Intro to Big Data and Hadoop

Intro to Big Data and Hadoop Intro to Big and Hadoop Portions copyright 2001 SAS Institute Inc., Cary, NC, USA. All Rights Reserved. Reproduced with permission of SAS Institute Inc., Cary, NC, USA. SAS Institute Inc. makes no warranties

More information

Real-Time Streaming: IMS to Apache Kafka and Hadoop

Real-Time Streaming: IMS to Apache Kafka and Hadoop Real-Time Streaming: IMS to Apache Kafka and Hadoop - 2017 Scott Quillicy SQData Outline methods of streaming mainframe data to big data platforms Set throughput / latency expectations for popular big

More information

ETL challenges on IOT projects. Pedro Martins Head of Implementation

ETL challenges on IOT projects. Pedro Martins Head of Implementation ETL challenges on IOT projects Pedro Martins Head of Implementation Outline What is Pentaho Pentaho Data Integration (PDI) Smartcity Copenhagen Example of Data structure without an OLAP schema Telematics

More information

New Approach for scheduling tasks and/or jobs in Big Data Cluster

New Approach for scheduling tasks and/or jobs in Big Data Cluster New Approach for scheduling tasks and/or jobs in Big Data Cluster IT College, Chairperson of MS Dept. Agenda Introduction What is Big Data? The 4 characteristics of Big Data V4s Different Categories of

More information

Architecture Optimization for the new Data Warehouse. Cloudera, Inc. All rights reserved.

Architecture Optimization for the new Data Warehouse. Cloudera, Inc. All rights reserved. Architecture Optimization for the new Data Warehouse Guido Oswald - @GuidoOswald 1 Use Cases This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently

More information

H2O Powers Intelligent Product Recommendation Engine at Transamerica. Case Study

H2O Powers Intelligent Product Recommendation Engine at Transamerica. Case Study H2O Powers Intelligent Product Recommendation Engine at Transamerica Case Study Summary For a financial services firm like Transamerica, sales and marketing efforts can be complex and challenging, with

More information

Copyright 2015 EMC Corporation. All rights reserved. STRATEGIC FORUM 2015 PAUL MARITZ CEO, PIVOTAL SOFTWARE

Copyright 2015 EMC Corporation. All rights reserved. STRATEGIC FORUM 2015 PAUL MARITZ CEO, PIVOTAL SOFTWARE STRATEGIC FORUM 2015 PAUL MARITZ CEO, PIVOTAL SOFTWARE BACK IN MARCH 2013, WE TOLD YOU PIVOTAL IS BEING CREATED TO: Respond to business needs to do new things to generate business value By creating a modern

More information

BIG DATA TRANSFORMS BUSINESS. The EMC Big Data Solution

BIG DATA TRANSFORMS BUSINESS. The EMC Big Data Solution BIG DATA The EMC Big Data Solution THE JOURNEY TO BIG DATA Businesses that exploit Big Data to improve strategy and execution are distancing themselves from competitors. The Big Data solution from EMC

More information

Big data using cloud computing

Big data using cloud computing Big data using cloud computing Bernice M. Purcell Holy Family University ABSTRACT Big Data is a data analysis methodology enabled by recent advances in technologies and architecture. However, big data

More information

Modernizing Your Data Warehouse with Azure

Modernizing Your Data Warehouse with Azure Modernizing Your Data Warehouse with Azure Big data. Small data. All data. Christian Coté S P O N S O R S The traditional BI Environment The traditional data warehouse data warehousing has reached the

More information

Simplifying Hadoop. Sponsored by. July >> Computing View Point

Simplifying Hadoop. Sponsored by. July >> Computing View Point Sponsored by >> Computing View Point Simplifying Hadoop July 2013 The gap between the potential power of Hadoop and the technical difficulties in its implementation are narrowing and about time too Contents

More information

#mstrworld. A Deep Dive Into Self-Service Data Discovery In MicroStrategy. Vijay Anand Gianthomas Tewksbury Volpe. #mstrworld

#mstrworld. A Deep Dive Into Self-Service Data Discovery In MicroStrategy. Vijay Anand Gianthomas Tewksbury Volpe. #mstrworld A Deep Dive Into Self-Service Data Discovery In MicroStrategy Vijay Anand Gianthomas Tewksbury Volpe Introducing MicroStrategy Analytics Agenda Introduction to MicroStrategy Analytics Platform Product

More information

Transforming Analytics with Cloudera Data Science WorkBench

Transforming Analytics with Cloudera Data Science WorkBench Transforming Analytics with Cloudera Data Science WorkBench Process data, develop and serve predictive models. 1 Age of Machine Learning Data volume NO Machine Learning Machine Learning 1950s 1960s 1970s

More information

BIG DATA and DATA SCIENCE

BIG DATA and DATA SCIENCE Integrated Program In BIG DATA and DATA SCIENCE CONTINUING STUDIES Table of Contents About the Course...03 Key Features of Integrated Program in Big Data and Data Science...04 Learning Path...05 Key Learning

More information

Ray M Sugiarto MAPR Champion Indonesia

Ray M Sugiarto MAPR Champion Indonesia Ray M Sugiarto MAPR Champion Indonesia 0815 167 2882 2015 MapR Technologies 2015 MapR Technologies 1 Why Big Data? University of Texas: The median Fortune 1000 company could increase its revenue by more

More information

Developing an analytics everywhere framework for the Internet of Things. Ph.D. Research Proposal by Hung Cao

Developing an analytics everywhere framework for the Internet of Things. Ph.D. Research Proposal by Hung Cao Developing an analytics everywhere framework for the Internet of Things Ph.D. Research Proposal by Hung Cao Research Motivation The Internet of Things IoT devices require pushing the data streams from

More information

EXAMPLE SOLUTIONS Hadoop in Azure HBase as a columnar NoSQL transactional database running on Azure Blobs Storm as a streaming service for near real time processing Hadoop 2.4 support for 100x query gains

More information

Competency Map for the Data Science and Analytics-Enabled Graduate

Competency Map for the Data Science and Analytics-Enabled Graduate Competency Map for the Data Science and Analytics-Enabled Graduate Purpose of Competency Map The purpose of this competency map is to identify the specific skills, knowledge, abilities, and attributes

More information

The Evolution of Big Data

The Evolution of Big Data The Evolution of Big Data Andrew Fast, Ph.D. Chief Scientist fast@elderresearch.com Headquarters 300 W. Main Street, Suite 301 Charlottesville, VA 22903 434.973.7673 fax 434.973.7875 www.elderresearch.com

More information

Cloud-Scale Data Platform

Cloud-Scale Data Platform Guide to Supporting On-Premise Spark Deployments with a Cloud-Scale Data Platform Apache Spark has become one of the most rapidly adopted open source platforms in history. Demand is predicted to grow at

More information

Engaging in Big Data Transformation in the GCC

Engaging in Big Data Transformation in the GCC Sponsored by: IBM Author: Megha Kumar December 2015 Engaging in Big Data Transformation in the GCC IDC Opinion In a rapidly evolving IT ecosystem, "transformation" and in some cases "disruption" is changing

More information

COMP9321 Web Application Engineering

COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 1, 2017 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 11 (Part II) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2457

More information

Apache Spark 2.0 GA. The General Engine for Modern Analytic Use Cases. Cloudera, Inc. All rights reserved.

Apache Spark 2.0 GA. The General Engine for Modern Analytic Use Cases. Cloudera, Inc. All rights reserved. Apache Spark 2.0 GA The General Engine for Modern Analytic Use Cases 1 Apache Spark Drives Business Innovation Apache Spark is driving new business value that is being harnessed by technology forward organizations.

More information

EDW MODERNIZATION & CONSUMPTION

EDW MODERNIZATION & CONSUMPTION EDW MODERNIZATION & CONSUMPTION RAPIDLY. AT ANY SCALE. TRANSFORMING THE EDW TO BIG DATA/CLOUD VISUAL DATA SCIENCE AND ETL WITH APACHE SPARK FASTEST BI ON BIG DATA AT MASSIVE SCALE Table of Contents Introduction...

More information

Taking Advantage of Cloud Elasticity and Flexibility

Taking Advantage of Cloud Elasticity and Flexibility Taking Advantage of Cloud Elasticity and Flexibility Fred Koopmans Sr. Director of Product Management 1 Public cloud adoption is surging 2 Cloudera customers are leading the way 3 Hadoop was born for the

More information

Analytics for All Your Data: Cloud Essentials. Pervasive Insight in the World of Cloud

Analytics for All Your Data: Cloud Essentials. Pervasive Insight in the World of Cloud Analytics for All Your Data: Cloud Essentials Pervasive Insight in the World of Cloud The Opportunity We re living in a world where just about everything we see, do, hear, feel, and experience is captured

More information

Preface About the Book

Preface About the Book Preface About the Book We are living in the dawn of what has been termed as the "Fourth Industrial Revolution" by the World Economic Forum (WEF) in 2016. The Fourth Industrial Revolution is marked through

More information

Apache Hadoop in the Datacenter and Cloud

Apache Hadoop in the Datacenter and Cloud Apache Hadoop in the Datacenter and Cloud The Shift to the Connected Data Architecture Digital Transformation fueled by Big Data Analytics and IoT ACTIONABLE INTELLIGENCE Cloud and Data Center IDMS Relational

More information