Simplifying the Process of Uploading and Extracting Data from Apache Hadoop

Similar documents
Hadoop Course Content

Hadoop Roadmap 2012 A Hortonworks perspective

5th Annual. Cloudera, Inc. All rights reserved.

Bringing the Power of SAS to Hadoop Title

Introduction to Big Data(Hadoop) Eco-System The Modern Data Platform for Innovation and Business Transformation

Architecting an Open Data Lake for the Enterprise

E-guide Hadoop Big Data Platforms Buyer s Guide part 1

Microsoft Azure Essentials

MapR: Solution for Customer Production Success

Big Business Value from Big Data and Hadoop


Big data is hard. Top 3 Challenges To Adopting Big Data

BIG DATA AND HADOOP DEVELOPER

Architecture Optimization for the new Data Warehouse. Cloudera, Inc. All rights reserved.

Hortonworks Data Platform

Jason Virtue Business Intelligence Technical Professional

Modernizing Your Data Warehouse with Azure

Angat Pinoy. Angat Negosyo. Angat Pilipinas.

Welcome! 2013 SAP AG or an SAP affiliate company. All rights reserved.

KnowledgeENTERPRISE FAST TRACK YOUR ACCESS TO BIG DATA WITH ANGOSS ADVANCED ANALYTICS ON SPARK. Advanced Analytics on Spark BROCHURE

Course Content. The main purpose of the course is to give students the ability plan and implement big data workflows on HDInsight.

AZURE HDINSIGHT. Azure Machine Learning Track Marek Chmel

Apache Hadoop in the Datacenter and Cloud

Big Data & Hadoop Advance

Microsoft Big Data. Solution Brief

20775A: Performing Data Engineering on Microsoft HD Insight

Hybrid Data Management

Azure ML Data Camp. Ivan Kosyakov MTC Architect, Ph.D. Microsoft Technology Centers Microsoft Technology Centers. Experience the Microsoft Cloud

20775 Performing Data Engineering on Microsoft HD Insight

Common Customer Use Cases in FSI

Transforming Analytics with Cloudera Data Science WorkBench

ENABLING GLOBAL HADOOP WITH DELL EMC S ELASTIC CLOUD STORAGE (ECS)

APAC Big Data & Cloud Summit 2013

20775: Performing Data Engineering on Microsoft HD Insight

Investor Presentation. Fourth Quarter 2015

Big Data Job Descriptions. Software Engineer - Algorithms

Investor Presentation. Second Quarter 2016

20775A: Performing Data Engineering on Microsoft HD Insight

StackIQ Enterprise Data Reference Architecture

Business is being transformed by three trends

Optimal Infrastructure for Big Data

IBM Big Data Summit 2012

Cask Data Application Platform (CDAP)

Modern Data Architecture with Apache Hadoop

Data Analytics. Nagesh Madhwal Client Solutions Director, Consulting, Southeast Asia, Dell EMC

Hortonworks Connected Data Platforms

Analytics Platform System

Realising Value from Data

Building Your Big Data Team

GET MORE VALUE OUT OF BIG DATA

Analytics in the Cloud, Cross Functional Teams, and Apache Hadoop is not a Thing Ryan Packer, Bank of New Zealand

Redefine Big Data: EMC Data Lake in Action. Andrea Prosperi Systems Engineer

Datametica. The Modern Data Platform Enterprise Data Hub Implementations. Why is workload moving to Cloud

Why Big Data Matters? Speaker: Paras Doshi

Teradata IntelliSphere

Operational Hadoop and the Lambda Architecture for Streaming Data

Apache Spark 2.0 GA. The General Engine for Modern Analytic Use Cases. Cloudera, Inc. All rights reserved.

Azure Data Analytics & Machine Learning Seminar. Daire Cunningham: BI Practice Area Manager

Intro to Big Data and Hadoop

Evolution to Revolution: Big Data 2.0

Spark and Hadoop Perfect Together

Pentaho Technical Overview. Max Felber Solution Engineer September 22, 2016

ETL challenges on IOT projects. Pedro Martins Head of Implementation

Got Data Silos? Automate Data Ingestion Into Isilon In Support Of Analytics

Cask Data Application Platform (CDAP) Extensions

Guide to Modernize Your Enterprise Data Warehouse How to Migrate to a Hadoop-based Big Data Lake

Exploring Big Data and Data Analytics with Hadoop and IDOL. Brochure. You are experiencing transformational changes in the computing arena.

Confidential

BIG DATA PROCESSING A DEEP DIVE IN HADOOP/SPARK & AZURE SQL DW

ETL on Hadoop What is Required

ABOUT THIS TRAINING: This Hadoop training will also prepare you for the Big Data Certification of Cloudera- CCP and CCA.

THE CIO GUIDE TO BIG DATA ARCHIVING. How to pick the right product?

Insights to HDInsight

Architecture Overview for Data Analytics Deployments

Amsterdam. (technical) Updates & demonstration. Robert Voermans Governance architect

Cognitive Data Warehouse and Analytics

1. Intoduction to Hadoop

REDEFINE BIG DATA. Zvi Brunner CTO. Copyright 2015 EMC Corporation. All rights reserved.

Trifacta Data Wrangling for Hadoop: Accelerating Business Adoption While Ensuring Security & Governance

Enterprise Analytics Accelerating Your Path to Value with an Open Analytics Platform

Analyze Big Data Faster and Store it Cheaper. Dominick Huang CenterPoint Energy Russell Hull - SAP

Practices of Business Intelligence. (Business Intelligence, Analytics, and Data Science)

Hadoop Integration Deep Dive

Sr. Sergio Rodríguez de Guzmán CTO PUE

Contents at a Glance COPYRIGHTED MATERIAL. Introduction... 1 Part I: Getting Started with Big Data... 7

COPYRIGHTED MATERIAL. 1Big Data and the Hadoop Ecosystem

Big Data Analytics met Hadoop

Datametica DAMA. The Modern Data Platform Enterprise Data Hub Implementations. What is happening with Hadoop Why is workload moving to Cloud

DLT AnalyticsStack. Powering big data, analytics and data science strategies for government agencies

Big Data with Azure: where to begin?

zdata Solutions BI / Advanced Analytic Platform and Pilot Programs

The Internet of Things Wind Turbine Predictive Analytics. Fluitec Wind s Tribo-Analytics System Predicting Time-to-Failure

Hybrid Data Management

MapR: Converged Data Pla3orm and Quick Start Solu;ons. Robin Fong Regional Director South East Asia

How Data Science is Changing the Way Companies Do Business Colin White

Ensuring Trust in Big Data with SAP EIM Solutions. Scott Barrett Senior Director, Information Management Database & Technology Centre of Excellence

E-guide Hadoop Big Data Platforms Buyer s Guide part 3

Hadoop Stories. Tim Marston. Director, Regional Alliances Page 1. Hortonworks Inc All Rights Reserved

Big data beyond Hadoop How to integrate ALL your data. Kai waehner.de 4/26/13

Big Data Trends Arató Bence. BI Consulting

Transcription:

Simplifying the Process of Uploading and Extracting Data from Apache Hadoop Rohit Bakhshi, Solution Architect, Hortonworks Jim Walker, Director Product Marketing, Talend Page 1

About Us Rohit Bakhshi Solution Architect at Hortonworks Experience Hadoop in enterprise architecture Building advanced analytical applications Enjoys live jazz and drinking espresso Jim Walker Director Product Marketing at Talend Experience 10 years as developer, 10 as marketer Computer Security, DQ, MDM Big data Is a bit of a foodie and enjoys baseball (White Sox) Page 2

Agenda Introduction Impact Big Data in the Enterprise Hortonworks Data Platform Talend Overview Demo Q&A Page 3

Hortonworks Vision We believe that by the end of 2015, more than half the world's data will be processed by Apache Hadoop How to achieve that vision??? Enable ecosystem around enterprise-viable open source data platform. Page 4

Hortonworks Strategic Focus Enable Hadoop to be next-generation enterprise data platform Lead within Hadoop Community Engineering team that delivered every major Hadoop release since 0.1 Experience managing world s largest deployment Ongoing access to Y! s 1,000+ users and 40k+ nodes for testing, QA, etc. Unify & Enable Hadoop Ecosystem Provide 100% open source product Empower customers and partners overcome Hadoop knowledge gaps Enable organizations successfully develop and deploy solutions based on Hadoop Expert Role-based Training Full Lifecycle Support and Services Evaluate Pilot Production Page 5

Impact of Big Data on Data Analytics One of our top 5 IT priorities, 45% One of our top 10 IT priorities, 27% Source: Enterprise Strategy Group, 2012 Page 6

Transactions Interactions Observations AND FAR FAR BEYOND Chart content courtesy of Teradata, Inc. Page 7

Data-Driven Business The days are over when you build a product once and it just works. You have to take ideas, test them, iterate them, use data and analytics to understand what works and what doesn't in order to be successful. And that's how we use our big data infrastructure. Aaron Batalion, CTO of LivingSocial The Big Promise of Big Data, PCWorld, March 13, 2012 http://www.pcworld.com/businesscenter/article/251754/the_big_promise_of_big_data.html Page 8

What is Apache Hadoop? Solution for Big Data Designed for volume, velocity, variety & complexity of data Data Platform Deployed on Commodity Hardware that Stores petabytes of data reliably Runs highly distributed applications Enables a rational economics model Set of Open Source Projects Apache Software Foundation Loosely coupled, ship early/often One of the best examples of open source driving innovation and creating a market Page 9

Connecting All Of Your Big Data Serving Applications Traditional Data Warehouses, BI & Analytics Web Serving NoSQL RDMS EDW Data Marts BI / Analytics Store, Transform, Refine, Iterative Analytics, Archive all data Serving Logs Social Media Sensor Data Text Systems Unstructured Systems Page 10

Bridging Classic & Big Data Worlds Integrating EDW & Hadoop Capture only what s needed Classic Method Structured & Repeatable Analysis Business determines what questions to ask EDW IT structures the data to answer those questions Capture in case it s needed Hadoop IT delivers a platform for storing, refining, and analyzing all data sources Big Data Method Multi-structured & Iterative Analysis Business explores data for questions worth answering Page 11

Bridging Classic & Big Data Worlds Enabling Developers, Data Scientists, and Business Analysts Java, C/C++, Pig, JavaScript, Python, R, SAS, SQL, Excel, Reporting, etc. Ingest, Transform, Archive, Discover, Explore, Analyze, Report Fast data loading ELT/ETL and refinement Iterative analysis Online archival Path & pattern analysis Graph analysis Text analysis Machine learning Operational analysis Transactional analysis High volume ad-hoc Elastic data marts Batch Interactive Active Page 12

Key Components of Hadoop Stack Core Components Extended Components Zookeeper (Cluster Coordination) HBase (Columnar NoSQL Store) Pig (Data Flow) Hive (SQL) MapReduce (Distributed Programing Framework) HCatalog (Table & Schema Management) Ambari & Other Monitoring & Management Oozie & Other Workflow Scheduling Sqoop & Other Ingest, ETL tools HDFS (Hadoop Distributed File System) Mahout & Other libraries Page 13

Enable Ecosystem Around Platform Hortonworks Data Platform Operational APIs Operations The market needs a platform that is Open across 3 facets: Data: directly processes as well as coexists/integrates with any data flowing through a business Apps: delivers business value by enabling innovative new apps and enhancing existing apps Operations: integrates with operational models within the enterprise datacenter and the cloud Page 14

Talend and big data everything old is new again! Page 15

Challenge 1: Where project management? BIG data enables governance BIG information supports BIG decisions Information provides value to the business If you can't rely on your information then the result can be missed opportunities, or higher costs. BIG business drives Matthew West and Julian Fowler (1999). Developing High Quality Data Models. The European Process Industries STEP Technical Liaison Executive (EPISTLE). Page 16

Challenge 2: Data Quality Poor Data Quality * Big Data = Big Problems^2 Page 17

Challenge 3: Complex technology, limited resources volume, velocity, variety resources? Page 18

Talend Big Data Strategy Big Data Integration - Land data in a BD cluster without coding - Code generation for Hadoop HDFS, Hive, Sqoop Big Data Manipulation Simplify manipulation, such as sort and filter Pig components, HBase Big Data Quality & Governance Identify linkages & duplicates, validate big data Match component, execute basic quality features Big Data Project Management 4 Place frameworks around big data projects Common Repository, scheduling, monitoring strategic pillars Page 19

Why Talend Page 20

Why Talend Page 21

Demonstration Page 22 Talend 2011 22

2012 Talend Roadmap data big 2012 spring fall Computationally Intense Functions Matching (complete) Profiling Parsing Survivorship continue to monitor winning technologies & expand partnerships Page 23

Talend Open Studio for Big Data Democratize Big Data Pig Talend Open Studio for Big Data Improves efficiency of big data job design with graphic interface Abstracts and generates code Run transforms inside Hadoop Native support for HDFS, Pig, Hbase, Sqoop and Hive an open source ecosystem Apache License Available at talend.com Embedded in HWx Data Platform Page 24

Other Resources Next webinar: HDFS Federated April 18, 2012 @ 10am PST Register now :http://hortonworks.com/webinars/ Hadoop Summit June 13-14 San Jose, California www.hadoopsummit.org Hadoop Training and Certification Developing Solutions Using Apache Hadoop Administering Apache Hadoop http://hortonworks.com/training/ Page 25

Thank You! Questions? Page 26