Adopting Big Data Technologies in the Support of Official Statistical Production: Opportunities, Experiences and Lessons Learned
|
|
- Cecilia Randall
- 5 years ago
- Views:
Transcription
1 Adopting Big Data Technologies in the Support of Official Statistical Production: Opportunities, Experiences and Lessons Learned Antonino Virgillito Istat Istituto Nazionale di Statistica 31/07/2017 1
2 Introduction The use of Big Data sources in the context of production of official statistics has been at the core of several initiatives at both national and international level in recent years Among all the questions that were raised by the use of Big Data for statistics, a specific one is the use of the novel IT tools that are available for handling Big Data In this presentation we give an overview of Big Data technology and presents the different ways in which it can be used to support statistical production, analysing the experiences made in Istat and at international level
3 Overview of Big Data Tools Big Data Tool Specifically designed to cope with aspects such as large size of data and loose structure What are the characteristics of these tools? How can they be used in the context of statistical production?
4 Categories of Big Data Tools Distributed Computing Platforms Clusters of interconnected machines working as a whole in order to store and process data Hadoop Spark
5 Categories of Big Data Tools Massively Parallel Processing DB Provide real-time querying with fast response times over large data sets Dremel Drill Impala
6 Categories of Big Data Tools NoSQL Databases Not based on the traditional tabular data model but capable of handling non-structured data HBase MongoDB Elasticsearch
7 Uses of Big Data Tools RDBMS Offload Big Data staging Experiments Heavy processing
8 Case Study: UNECE Sandbox Web-based collaborative environment, hosted in Ireland by ICHEC (Irish Centre for High-End Computing) UNECE Big Data Project ( ) High-Level Group for the Modernisation of Official Statistics (HLG-MOS) Objective: to better understand how to use the power of Big Data to support the production of official statistics Both tools and data sources available Currently used as Training platform for the ESTP courses on Big Data Shared test environment for the ESS Big Data project promoted by Eurostat
9 Case Study: Istat Big Data Platform In-house Big Data platform 8-node Hadoop Cluster + Spark + MPP DB Designed for use in both production and experimental projects Scanner data Population estimates with mobile phone data Completely hosted on-premise and managed by internal staff Motivated by privacy constraint over datasets Costly solution evaluate cloud when possible
10 Lessons Learned: The Problem with Size Volume is the dimension having highest practical impact in real Big Data projects Datasets in the order of Tb of size, constantly growing No absolute threshold over which data becomes big strictly related to the kind of processing that they should be subject to Real datasets are smaller than what many companies are used to treat daily, but not common for an NSI
11 Lessons Learned: IT Tools as Enablers of Advanced Analytics Big data tools enable operations and analysis that are not possible with standard tools Or, they can provide better performance with respect to traditional tools when dealing with large data sets Example of execution times of real operations Istat Scanner Data project Big data tools vs RDBMS
12 Lessons Learned: Skills and Roles The relationship between IT and statisticians within a statistical organization is a particularly critical issue when working with Big Data How to exploit the potential of technology without compromising the autonomy of the research? Complete separation of concerns as commonly experienced in NSIs, may lead to inefficiencies. Capacity building is crucial New mentality needed! Mix of competences, collaborative approach
13 Conclusions The rising of Big Data created new problems for statistical organizations, that lie at the intersection of statistical analysis and IT Technology is now mature enough to provide us with a huge potential for easily and help organizations to give the answers to new and more complex questions A paradigm shift is needed in the approach to those aspects of the statistical business that need to exploit new technologies More cohesive and mixed approach between IT and statisticians No matter how big the datasets we will manage, this is only a part of the general problem of how statistical organizations would initiate the transition from traditional methods to the modern challenges of data science.
Experiences in the Use of Big Data for Official Statistics
Think Big - Data innovation in Latin America Santiago, Chile 6 th March 2017 Experiences in the Use of Big Data for Official Statistics Antonino Virgillito Istat Introduction The use of Big Data sources
More informationDataAdapt Active Insight
Solution Highlights Accelerated time to value Enterprise-ready Apache Hadoop based platform for data processing, warehousing and analytics Advanced analytics for structured, semistructured and unstructured
More informationStatistics & Optimization with Big Data
Statistics & Optimization with Big Data Technology and data driven decision science company focused on helping academics to solve big data and analytics problems of any kind, from any source, at massive
More informationHadoop and Analytics at CERN IT CERN IT-DB
Hadoop and Analytics at CERN IT CERN IT-DB 1 Hadoop Use cases Parallel processing of large amounts of data Perform analytics on a large scale Dealing with complex data: structured, semi-structured, unstructured
More informationIntroduction to Big Data(Hadoop) Eco-System The Modern Data Platform for Innovation and Business Transformation
Introduction to Big Data(Hadoop) Eco-System The Modern Data Platform for Innovation and Business Transformation Roger Ding Cloudera February 3rd, 2018 1 Agenda Hadoop History Introduction to Apache Hadoop
More informationSession 30 Powerful Ways to Use Hadoop in your Healthcare Big Data Strategy
Session 30 Powerful Ways to Use Hadoop in your Healthcare Big Data Strategy Bryan Hinton Senior Vice President, Platform Engineering Health Catalyst Sean Stohl Senior Vice President, Product Development
More informationBig Data & Hadoop Advance
Course Durations: 30 Hours About Company: Course Mode: Online/Offline EduNextgen extended arm of Product Innovation Academy is a growing entity in education and career transformation, specializing in today
More informationHADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics
HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics ESSENTIALS EMC ISILON Use the industry's first and only scale-out NAS solution with native Hadoop
More informationData Analytics. Nagesh Madhwal Client Solutions Director, Consulting, Southeast Asia, Dell EMC
Data Analytics Nagesh Madhwal Client Solutions Director, Consulting, Southeast Asia, Dell EMC Last 15 years IT-centric Traditional Analytics Traditional Applications Rigid Infrastructure Internet Next
More information5th Annual. Cloudera, Inc. All rights reserved.
5th Annual 1 The Essentials of Apache Hadoop The What, Why and How to Meet Agency Objectives Sarah Sproehnle, Vice President, Customer Success 2 Introduction 3 What is Apache Hadoop? Hadoop is a software
More informationSunnie Chung. Cleveland State University
Sunnie Chung Cleveland State University Data Scientist Big Data Processing Data Mining 2 INTERSECT of Computer Scientists and Statisticians with Knowledge of Data Mining AND Big data Processing Skills:
More informationBIG DATA PROCESSING A DEEP DIVE IN HADOOP/SPARK & AZURE SQL DW
BIG DATA PROCESSING A DEEP DIVE IN HADOOP/SPARK & AZURE SQL DW TOPICS COVERED 1 2 Fundamentals of Big Data Platforms Major Big Data Tools Scaling Up vs. Out SCALE UP (SMP) SCALE OUT (MPP) + (n) Upgrade
More informationNew Big Data Solutions and Opportunities for DB Workloads
New Big Data Solutions and Opportunities for DB Workloads Hadoop and Spark Ecosystem for Data Analytics, Experience and Outlook Luca Canali, IT-DB Hadoop and Spark Service WLCG, GDB meeting CERN, September
More informationMicrosoft Azure Essentials
Microsoft Azure Essentials Azure Essentials Track Summary Data Analytics Explore the Data Analytics services in Azure to help you analyze both structured and unstructured data. Azure can help with large,
More informationCognitive Data Warehouse and Analytics
Cognitive Data Warehouse and Analytics Hemant R. Suri, Sr. Offering Manager, Hybrid Data Warehouses, IBM (twitter @hemantrsuri or feel free to reach out to me via LinkedIN!) Over 90% of the world s data
More informationEXECUTIVE BRIEF. Successful Data Warehouse Approaches to Meet Today s Analytics Demands. In this Paper
Sponsored by Successful Data Warehouse Approaches to Meet Today s Analytics Demands EXECUTIVE BRIEF In this Paper Organizations are adopting increasingly sophisticated analytics methods Analytics usage
More informationData Analytics and CERN IT Hadoop Service. CERN openlab Technical Workshop CERN, December 2016 Luca Canali, IT-DB
Data Analytics and CERN IT Hadoop Service CERN openlab Technical Workshop CERN, December 2016 Luca Canali, IT-DB 1 Data Analytics at Scale The Challenge When you cannot fit your workload in a desktop Data
More informationE-guide Hadoop Big Data Platforms Buyer s Guide part 1
Hadoop Big Data Platforms Buyer s Guide part 1 Your expert guide to Hadoop big data platforms for managing big data David Loshin, Knowledge Integrity Inc. Companies of all sizes can use Hadoop, as vendors
More informationBringing the Power of SAS to Hadoop Title
WHITE PAPER Bringing the Power of SAS to Hadoop Title Combine SAS World-Class Analytics With Hadoop s Low-Cost, Distributed Data Storage to Uncover Hidden Opportunities ii Contents Introduction... 1 What
More informationAccelerating Your Big Data Analytics. Jeff Healey, Director Product Marketing, HPE Vertica
Accelerating Your Big Data Analytics Jeff Healey, Director Product Marketing, HPE Vertica Recent Waves of Disruption IT Infrastructu re for Analytics Data Warehouse Modernization Big Data/ Hadoop Cloud
More informationBuilding Your Big Data Team
Building Your Big Data Team With all the buzz around Big Data, many companies have decided they need some sort of Big Data initiative in place to stay current with modern data management requirements.
More informationHigh-Level Group for the Modernisation of Statistical Production and Services
United Nations Economic Commission for Europe Statistical Division High-Level Group for the Modernisation of Statistical Production and Services Steven Vale UNECE steven.vale@unece.org Matjaž Jug UNECE
More informationOracle Big Data Cloud Service
Oracle Big Data Cloud Service Delivering Hadoop, Spark and Data Science with Oracle Security and Cloud Simplicity Oracle Big Data Cloud Service is an automated service that provides a highpowered environment
More informationConfidential
June 2017 1. Is your EDW becoming too expensive to maintain because of hardware upgrades and increasing data volumes? 2. Is your EDW becoming a monolith, which is too slow to adapt to business s analytical
More informationOptimal Infrastructure for Big Data
Optimal Infrastructure for Big Data Big Data 2014 Managing Government Information Kevin Leong January 22, 2014 2014 VMware Inc. All rights reserved. The Right Big Data Tools for the Right Job Real-time
More informationAZURE HDINSIGHT. Azure Machine Learning Track Marek Chmel
AZURE HDINSIGHT Azure Machine Learning Track Marek Chmel SESSION AGENDA Understanding different scenarios of Hadoop Building an end to end pipeline using HDInsight Using in-memory techniques to analyze
More informationABOUT THIS TRAINING: This Hadoop training will also prepare you for the Big Data Certification of Cloudera- CCP and CCA.
ABOUT THIS TRAINING: The world of Hadoop and Big Data" can be intimidating - hundreds of different technologies with cryptic names form the Hadoop ecosystem. This comprehensive training has been designed
More informationCloudera Data Science and Machine Learning. Robin Harrison, Account Executive David Kemp, Systems Engineer. Cloudera, Inc. All rights reserved.
Cloudera Data Science and Machine Learning Robin Harrison, Account Executive David Kemp, Systems Engineer 1 This is the age of machine learning. Data volume NO Machine Learning Machine Learning 1950s 1960s
More informationHadoop fundamentals. Big Data Consulting. Robert Gibbon
Hadoop fundamentals Big Data Consulting Robert Gibbon Rob Gibbon Architect @Big Industries Belgium Focus on designing, deploying & integrating web scale solutions with Hadoop Deliveries for clients in
More informationTechValidate Survey Report. Converged Data Platform Key to Competitive Advantage
TechValidate Survey Report Converged Data Platform Key to Competitive Advantage TechValidate Survey Report Converged Data Platform Key to Competitive Advantage Executive Summary What Industry Analysts
More informationSr. Sergio Rodríguez de Guzmán CTO PUE
PRODUCT LATEST NEWS Sr. Sergio Rodríguez de Guzmán CTO PUE www.pue.es Hadoop & Why Cloudera Sergio Rodríguez Systems Engineer sergio@pue.es 3 Industry-Leading Consulting and Training PUE is the first Spanish
More informationGuide to Modernize Your Enterprise Data Warehouse How to Migrate to a Hadoop-based Big Data Lake
White Paper Guide to Modernize Your Enterprise Data Warehouse How to Migrate to a Hadoop-based Big Data Lake Motivation for Modernization It is now a well-documented realization among Fortune 500 companies
More informationMapR: Solution for Customer Production Success
2015 MapR Technologies 2015 MapR Technologies 1 MapR: Solution for Customer Production Success Big Data High Growth 700+ Customers Cloud Leaders Riding the Wave with Hadoop The Big Data Platform of Choice
More informationCopyright - Diyotta, Inc. - All Rights Reserved. Page 2
Page 2 Page 3 Page 4 Page 5 Humanizing Analytics Analytic Solutions that Provide Powerful Insights about Today s Healthcare Consumer to Manage Risk and Enable Engagement and Activation Industry Alignment
More informationMeetup DB2 LUW - Madrid. IBM dashdb. Raquel Cadierno Torre IBM 1 de Julio de IBM Corporation
IBM dashdb Raquel Cadierno Torre IBM Analytics @IBMAnalytics rcadierno@es.ibm.com 1 de Julio de 2016 1 2016 IBM Corporation What is dashdb? http://www.ibm.com/analytics/us/en/technology/cloud-data-services/dashdb/
More informationCommon Customer Use Cases in FSI
Common Customer Use Cases in FSI 1 Marketing Optimization 2014 2014 MapR MapR Technologies Technologies 2 Fortune 100 Financial Services Company 104M CARD MEMBERS 3 Financial Services: Recommendation Engine
More informationGetting Started with Amazon QuickSight
Getting Started with QuickSight Matt McClean Solutions Architect, AWS April 14, 2016 2016, Web Services, Inc. or its Affiliates. All rights reserved. What to Expect from the Session Overview of Big Data
More informationOperational Hadoop and the Lambda Architecture for Streaming Data
Operational Hadoop and the Lambda Architecture for Streaming Data 2015 MapR Technologies 2015 MapR Technologies 1 Topics From Batch to Operational Workloads on Hadoop Streaming Data Environments The Lambda
More informationSix Critical Capabilities for a Big Data Analytics Platform
White Paper Analytics & Big Data Six Critical Capabilities for a Big Data Analytics Platform Table of Contents page Executive Summary...1 Key Requirements for a Big Data Analytics Platform...1 Vertica:
More informationDatametica. The Modern Data Platform Enterprise Data Hub Implementations. Why is workload moving to Cloud
Datametica The Modern Data Platform Enterprise Data Hub Implementations Why is workload moving to Cloud 1 What we used do Enterprise Data Hub & Analytics What is Changing Why it is Changing Enterprise
More informationGE Intelligent Platforms. Proficy Historian HD
GE Intelligent Platforms Proficy Historian HD The Industrial Big Data Historian Industrial machines have always issued early warnings, but in an inconsistent way and in a language that people could not
More informationA cross-cutting project on Information Models and Standards
Distr. GENERAL 29 April 2013 WP.5 ENGLISH ONLY UNITED NATIONS ECONOMIC COMMISSION FOR EUROPE CONFERENCE OF EUROPEAN STATISTICIANS EUROPEAN COMMISSION STATISTICAL OFFICE OF THE EUROPEAN UNION (EUROSTAT)
More informationStatistical Architecture Models
United Nations Economic Commission for Europe Statistical Division Statistical Architecture Models Steven Vale UNECE steven.vale@unece.org Introducing UNECE Statistics Introducing the HLG-MOS High-level
More informationChina Center of Excellence
1 China Center of Excellence Project Guardian ebay is a global company, projects within ebay normally require efforts and synergies from teams located in different cities of different countries. This is
More informationWhy Big Data Matters? Speaker: Paras Doshi
Why Big Data Matters? Speaker: Paras Doshi If you re wondering about what is Big Data and why does it matter to you and your organization, then come to this talk and get introduced to Big Data and learn
More informationBIG DATA AND HADOOP DEVELOPER
BIG DATA AND HADOOP DEVELOPER Approximate Duration - 60 Hrs Classes + 30 hrs Lab work + 20 hrs Assessment = 110 Hrs + 50 hrs Project Total duration of course = 160 hrs Lesson 00 - Course Introduction 0.1
More informationEBOOK: Cloudwick Powering the Digital Enterprise
EBOOK: Cloudwick Powering the Digital Enterprise Contents What is a Data Lake?... Benefits of a Data Lake on AWS... Building a Data Lake on AWS... Cloudwick Case Study... About Cloudwick... Getting Started...
More informationDATA SCIENCE: HYPE AND REALITY PATRICK HALL
DATA SCIENCE: HYPE AND REALITY PATRICK HALL About me SAS Enterprise Miner, 2012 Cloudera Data Scientist, 2014 Do you use Kolmogorov Smirnov often? Statistician No, I mix my martinis with gin. Data Scientist
More informationAnalytics for All Data
Analytics for All Data How Oracle Analytics Helps Agencies Improve Their Effectiveness FORCES 2017 Jim Penn Sr Manager, Public Sector Oracle Analytics & Big Data Agenda Oracle s Analytics Platform Overview
More informationESSnet on Free and Open Source Software for Statistical Production
ESSnet on Free and Open Source Software for Statistical Production Project proposal 08.02.2013 Prepared by: Giulio Barcaroli (ISTAT), Duncan Elliot (ONS), Mark van der Loo (Statistics Netherlands), and
More informationwith Dell EMC s On-Premises Solutions
902 Broadway, 7th Floor New York, NY 10010 www.theedison.com @EdisonGroupInc 212.367.7400 Lower the Cost of Analytics with Dell EMC s On-Premises Solutions Comparing Total Cost of Ownership of Dell EMC
More informationLeveraging Oracle Big Data Discovery to Master CERN s Data. Manuel Martín Márquez Oracle Business Analytics Innovation 12 October- Stockholm, Sweden
Leveraging Oracle Big Data Discovery to Master CERN s Data Manuel Martín Márquez Oracle Business Analytics Innovation 12 October- Stockholm, Sweden Manuel Martin Marquez Intel IoT Ignition Lab Cloud and
More informationGPU ACCELERATED BIG DATA ARCHITECTURE
INNOVATION PLATFORM WHITE PAPER 1 Today s enterprise is producing and consuming more data than ever before. Enterprise data storage and processing architectures have struggled to keep up with this exponentially
More informationIBM Db2 Warehouse. Hybrid data warehousing using a software-defined environment in a private cloud. The evolution of the data warehouse
IBM Db2 Warehouse Hybrid data warehousing using a software-defined environment in a private cloud The evolution of the data warehouse Managing a large-scale, on-premises data warehouse environments to
More informationPNDA.io: when big data and OSS collide
.io: when big data and OSS collide Simplified OSS / BSS Stack [Build Slide] Order Customer Bills and Reports Order Mgmt BSS Billing and Reporting Orchestration is responsible for service provisioning and
More informationSAS & HADOOP ANALYTICS ON BIG DATA
SAS & HADOOP ANALYTICS ON BIG DATA WHY HADOOP? OPEN SOURCE MASSIVE SCALE FAST PROCESSING COMMODITY COMPUTING DATA REDUNDANCY DISTRIBUTED WHY HADOOP? Hadoop will soon become a replacement complement to:
More informationzdata Solutions BI / Advanced Analytic Platform and Pilot Programs
zdata Solutions BI / Advanced Analytic Platform and Pilot Programs BI & Analytics Platform Store Gather, integrate, load and manage your data in the cloud or on premise Collaborate Validate and dimensionalize
More informationBuilding a Modern Data Warehouse in Azure for Power BI
Building a Modern Data Warehouse in Azure for Power BI About Us Phil Spokas email phil@intellitect.com Web Intellitect.com/phil Twitter Linked In @philspokas /in/philspokas web Twitter Intellitect.com
More informationInsights to HDInsight
Insights to HDInsight Why Hadoop in the Cloud? No hardware costs Unlimited Scale Pay for What You Need Deployed in minutes Azure HDInsight Big Data made easy Enterprise Ready Easier and more productive
More informationBig Data Application Engineer/ Developer. Specialization in Apache Spark, Kafka, Airflow, HBase
BIG DATA COURSE Big Data Application Engineer/ Developer Specialization in Apache Spark, Kafka, Airflow, HBase In Exclusive Association with 21,347+ Participants 10,000+ Brands 1200+ Trainings 45+ Countries
More informationBuilding a Robust Analytics Platform
akass@ + dmi@ Building a Robust Analytics Platform with an open-source stack What s coming up: 1) DigitalOcean - a company background 2) Data @ DigitalOcean 3) The Big Data Tech Stack @ DO 4) Use-cases
More informationBusiness is being transformed by three trends
Business is being transformed by three trends Big Cloud Intelligence Stay ahead of the curve with Cortana Intelligence Suite Business apps People Custom apps Apps Sensors and devices Cortana Intelligence
More informationPool Data: 2/18/2018. Best Practices and Practical Considerations. Do you have the Moneyball Mindset at your pool?
Pool Data: Best Practices and Practical Considerations RYAN DRAUGHN, DIRECTOR OF INFORMATION TECHNOLOGY NLC MUTUAL INSURANCE COMPANY 1 Do you have the Moneyball Mindset at your pool? 2 Agenda Leveraging
More informationDatabricks Cloud. A Primer
Databricks Cloud A Primer Who is Databricks? Databricks was founded by the team behind Apache Spark, the most active open source project in the big data ecosystem today. Our mission at Databricks is to
More informationCourse 20467C: Designing Self-Service Business Intelligence and Big Data Solutions
Course 20467C: Designing Self-Service Business Intelligence and Big Data Solutions Length : 5 Days Audience(s) : IT Professionals Level : 300 Technology : Microsoft SQL Server 2014 Delivery Method : Instructor-led
More informationData Analytics Use Cases, Platforms, Services. ITMM, March 5 th, 2018 Luca Canali, IT-DB
Data Analytics Use Cases, Platforms, Services ITMM, March 5 th, 2018 Luca Canali, IT-DB 1 Analytics and Big Data Pipelines Use Cases Many use cases at CERN for analytics Data analysis, dashboards, plots,
More informationHadoop Course Content
Hadoop Course Content Hadoop Course Content Hadoop Overview, Architecture Considerations, Infrastructure, Platforms and Automation Use case walkthrough ETL Log Analytics Real Time Analytics Hbase for Developers
More informationAnalytics in Action transforming the way we use and consume information
Analytics in Action transforming the way we use and consume information Big Data Ecosystem The Data Traditional Data BIG DATA Repositories MPP Appliances Internet Hadoop Data Streaming Big Data Ecosystem
More informationNouvelle Génération de l infrastructure Data Warehouse et d Analyses
Nouvelle Génération de l infrastructure Data Warehouse et d Analyses November 2011 André Münger andre.muenger@emc.com +41 79 708 85 99 1 Agenda BIG Data Challenges Greenplum Overview Use Cases Summary
More informationMapR Pentaho Business Solutions
MapR Pentaho Business Solutions The Benefits of a Converged Platform to Big Data Integration Tom Scurlock Director, WW Alliances and Partners, MapR Key Takeaways 1. We focus on business values and business
More informationBig Data Trends Arató Bence. BI Consulting
Big Data Trends 2017 Arató Bence BI Consulting arato@biconsulting.hu 1 Introduction Arató Bence Consulting and Advisory BI/DW/Big Data strategy, Architecture planning, vendor and tool selection. Also provides
More informationOutline of Hadoop. Background, Core Services, and Components. David Schwab Synchronic Analytics Nov.
Outline of Hadoop Background, Core Services, and Components David Schwab Synchronic Analytics https://synchronicanalytics.com Nov. 1, 2018 Hadoop s Purpose and Origin Hadoop s Architecture Minimum Configuration
More informationIntro to Big Data and Hadoop
Intro to Big and Hadoop Portions copyright 2001 SAS Institute Inc., Cary, NC, USA. All Rights Reserved. Reproduced with permission of SAS Institute Inc., Cary, NC, USA. SAS Institute Inc. makes no warranties
More informationReal-Time Streaming: IMS to Apache Kafka and Hadoop
Real-Time Streaming: IMS to Apache Kafka and Hadoop - 2017 Scott Quillicy SQData Outline methods of streaming mainframe data to big data platforms Set throughput / latency expectations for popular big
More informationETL challenges on IOT projects. Pedro Martins Head of Implementation
ETL challenges on IOT projects Pedro Martins Head of Implementation Outline What is Pentaho Pentaho Data Integration (PDI) Smartcity Copenhagen Example of Data structure without an OLAP schema Telematics
More informationNew Approach for scheduling tasks and/or jobs in Big Data Cluster
New Approach for scheduling tasks and/or jobs in Big Data Cluster IT College, Chairperson of MS Dept. Agenda Introduction What is Big Data? The 4 characteristics of Big Data V4s Different Categories of
More informationArchitecture Optimization for the new Data Warehouse. Cloudera, Inc. All rights reserved.
Architecture Optimization for the new Data Warehouse Guido Oswald - @GuidoOswald 1 Use Cases This image cannot currently be displayed. This image cannot currently be displayed. This image cannot currently
More informationH2O Powers Intelligent Product Recommendation Engine at Transamerica. Case Study
H2O Powers Intelligent Product Recommendation Engine at Transamerica Case Study Summary For a financial services firm like Transamerica, sales and marketing efforts can be complex and challenging, with
More informationCopyright 2015 EMC Corporation. All rights reserved. STRATEGIC FORUM 2015 PAUL MARITZ CEO, PIVOTAL SOFTWARE
STRATEGIC FORUM 2015 PAUL MARITZ CEO, PIVOTAL SOFTWARE BACK IN MARCH 2013, WE TOLD YOU PIVOTAL IS BEING CREATED TO: Respond to business needs to do new things to generate business value By creating a modern
More informationBIG DATA TRANSFORMS BUSINESS. The EMC Big Data Solution
BIG DATA The EMC Big Data Solution THE JOURNEY TO BIG DATA Businesses that exploit Big Data to improve strategy and execution are distancing themselves from competitors. The Big Data solution from EMC
More informationBig data using cloud computing
Big data using cloud computing Bernice M. Purcell Holy Family University ABSTRACT Big Data is a data analysis methodology enabled by recent advances in technologies and architecture. However, big data
More informationModernizing Your Data Warehouse with Azure
Modernizing Your Data Warehouse with Azure Big data. Small data. All data. Christian Coté S P O N S O R S The traditional BI Environment The traditional data warehouse data warehousing has reached the
More informationSimplifying Hadoop. Sponsored by. July >> Computing View Point
Sponsored by >> Computing View Point Simplifying Hadoop July 2013 The gap between the potential power of Hadoop and the technical difficulties in its implementation are narrowing and about time too Contents
More information#mstrworld. A Deep Dive Into Self-Service Data Discovery In MicroStrategy. Vijay Anand Gianthomas Tewksbury Volpe. #mstrworld
A Deep Dive Into Self-Service Data Discovery In MicroStrategy Vijay Anand Gianthomas Tewksbury Volpe Introducing MicroStrategy Analytics Agenda Introduction to MicroStrategy Analytics Platform Product
More informationTransforming Analytics with Cloudera Data Science WorkBench
Transforming Analytics with Cloudera Data Science WorkBench Process data, develop and serve predictive models. 1 Age of Machine Learning Data volume NO Machine Learning Machine Learning 1950s 1960s 1970s
More informationBIG DATA and DATA SCIENCE
Integrated Program In BIG DATA and DATA SCIENCE CONTINUING STUDIES Table of Contents About the Course...03 Key Features of Integrated Program in Big Data and Data Science...04 Learning Path...05 Key Learning
More informationRay M Sugiarto MAPR Champion Indonesia
Ray M Sugiarto MAPR Champion Indonesia 0815 167 2882 2015 MapR Technologies 2015 MapR Technologies 1 Why Big Data? University of Texas: The median Fortune 1000 company could increase its revenue by more
More informationDeveloping an analytics everywhere framework for the Internet of Things. Ph.D. Research Proposal by Hung Cao
Developing an analytics everywhere framework for the Internet of Things Ph.D. Research Proposal by Hung Cao Research Motivation The Internet of Things IoT devices require pushing the data streams from
More informationEXAMPLE SOLUTIONS Hadoop in Azure HBase as a columnar NoSQL transactional database running on Azure Blobs Storm as a streaming service for near real time processing Hadoop 2.4 support for 100x query gains
More informationCompetency Map for the Data Science and Analytics-Enabled Graduate
Competency Map for the Data Science and Analytics-Enabled Graduate Purpose of Competency Map The purpose of this competency map is to identify the specific skills, knowledge, abilities, and attributes
More informationThe Evolution of Big Data
The Evolution of Big Data Andrew Fast, Ph.D. Chief Scientist fast@elderresearch.com Headquarters 300 W. Main Street, Suite 301 Charlottesville, VA 22903 434.973.7673 fax 434.973.7875 www.elderresearch.com
More informationCloud-Scale Data Platform
Guide to Supporting On-Premise Spark Deployments with a Cloud-Scale Data Platform Apache Spark has become one of the most rapidly adopted open source platforms in history. Demand is predicted to grow at
More informationEngaging in Big Data Transformation in the GCC
Sponsored by: IBM Author: Megha Kumar December 2015 Engaging in Big Data Transformation in the GCC IDC Opinion In a rapidly evolving IT ecosystem, "transformation" and in some cases "disruption" is changing
More informationCOMP9321 Web Application Engineering
COMP9321 Web Application Engineering Semester 1, 2017 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 11 (Part II) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2457
More informationApache Spark 2.0 GA. The General Engine for Modern Analytic Use Cases. Cloudera, Inc. All rights reserved.
Apache Spark 2.0 GA The General Engine for Modern Analytic Use Cases 1 Apache Spark Drives Business Innovation Apache Spark is driving new business value that is being harnessed by technology forward organizations.
More informationEDW MODERNIZATION & CONSUMPTION
EDW MODERNIZATION & CONSUMPTION RAPIDLY. AT ANY SCALE. TRANSFORMING THE EDW TO BIG DATA/CLOUD VISUAL DATA SCIENCE AND ETL WITH APACHE SPARK FASTEST BI ON BIG DATA AT MASSIVE SCALE Table of Contents Introduction...
More informationTaking Advantage of Cloud Elasticity and Flexibility
Taking Advantage of Cloud Elasticity and Flexibility Fred Koopmans Sr. Director of Product Management 1 Public cloud adoption is surging 2 Cloudera customers are leading the way 3 Hadoop was born for the
More informationAnalytics for All Your Data: Cloud Essentials. Pervasive Insight in the World of Cloud
Analytics for All Your Data: Cloud Essentials Pervasive Insight in the World of Cloud The Opportunity We re living in a world where just about everything we see, do, hear, feel, and experience is captured
More informationPreface About the Book
Preface About the Book We are living in the dawn of what has been termed as the "Fourth Industrial Revolution" by the World Economic Forum (WEF) in 2016. The Fourth Industrial Revolution is marked through
More informationApache Hadoop in the Datacenter and Cloud
Apache Hadoop in the Datacenter and Cloud The Shift to the Connected Data Architecture Digital Transformation fueled by Big Data Analytics and IoT ACTIONABLE INTELLIGENCE Cloud and Data Center IDMS Relational
More information