Hadoop Course Content

Similar documents
BIG DATA AND HADOOP DEVELOPER

Big Data & Hadoop Advance

Simplifying the Process of Uploading and Extracting Data from Apache Hadoop

20775: Performing Data Engineering on Microsoft HD Insight

MapR: Solution for Customer Production Success

1Week. Big Data & Hadoop. Why big data & Hadoop is important? National Winter Training program on

1. Intoduction to Hadoop

Big Data. By Michael Covert. April 2012

HADOOP ADMINISTRATION

Redefine Big Data: EMC Data Lake in Action. Andrea Prosperi Systems Engineer

Cask Data Application Platform (CDAP)

Big Data Job Descriptions. Software Engineer - Algorithms

Apache Spark 2.0 GA. The General Engine for Modern Analytic Use Cases. Cloudera, Inc. All rights reserved.

Welcome! 2013 SAP AG or an SAP affiliate company. All rights reserved.

Building Your Big Data Team

Big Business Value from Big Data and Hadoop

E-guide Hadoop Big Data Platforms Buyer s Guide part 1

ORACLE DATA INTEGRATOR ENTERPRISE EDITION

BIG DATA and DATA SCIENCE

5th Annual. Cloudera, Inc. All rights reserved.

Peers Techno log ies Pv t. L td. CLOUD COMPUTING

SAS and Hadoop Technology: Overview

Digging into Hadoop-based Big Data Architectures

Hadoop in Production. Charles Zedlewski, VP, Product

COPYRIGHTED MATERIAL. 1Big Data and the Hadoop Ecosystem

Why Big Data Matters? Speaker: Paras Doshi

Post Graduate Program in BIG DATA ENGINEERING. In association with 11 MONTHS ONLINE

Analytics in the Cloud, Cross Functional Teams, and Apache Hadoop is not a Thing Ryan Packer, Bank of New Zealand

Hadoop Integration Deep Dive

New Big Data Solutions and Opportunities for DB Workloads

ENABLING GLOBAL HADOOP WITH DELL EMC S ELASTIC CLOUD STORAGE (ECS)

Insights to HDInsight

Business is being transformed by three trends

E-guide Hadoop Big Data Platforms Buyer s Guide part 3

ETL on Hadoop What is Required

Hadoop. by Dirk deroos, Paul C. Zikopoulos, Bruce Brown, Rafael Coss, and Roman B. Melnyk

IBM Big Data Summit 2012

Microsoft Azure Essentials

BIG DATA PROCESSING A DEEP DIVE IN HADOOP/SPARK & AZURE SQL DW

Hortonworks Data Platform. Buyer s Guide

Bringing the Power of SAS to Hadoop Title

Hadoop and Analytics at CERN IT CERN IT-DB

Data Analytics and CERN IT Hadoop Service. CERN openlab Technical Workshop CERN, December 2016 Luca Canali, IT-DB

Azure ML Data Camp. Ivan Kosyakov MTC Architect, Ph.D. Microsoft Technology Centers Microsoft Technology Centers. Experience the Microsoft Cloud

Cask Data Application Platform (CDAP) The Integrated Platform for Developers and Organizations to Build, Deploy, and Manage Data Applications

Copyright 2015, Oracle and/or its affiliates. All rights reserved.

Cloud Integration and the Big Data Journey - Common Use-Case Patterns

COURSE OUTLINE: Implementing a Data Warehouse with SQL Server Implementing a Data Warehouse with SQL Server 2014

Implementing a Data Warehouse with Microsoft SQL Server

GET MORE VALUE OUT OF BIG DATA

Jason Virtue Business Intelligence Technical Professional

Operational Hadoop and the Lambda Architecture for Streaming Data

Leveraging Oracle Big Data Discovery to Master CERN s Data. Manuel Martín Márquez Oracle Business Analytics Innovation 12 October- Stockholm, Sweden

SAS & HADOOP ANALYTICS ON BIG DATA

Ray M Sugiarto MAPR Champion Indonesia

Data Analytics. Nagesh Madhwal Client Solutions Director, Consulting, Southeast Asia, Dell EMC

Deloitte School of Analytics. Demystifying Data Science: Leveraging this phenomenon to drive your organisation forward

Sr. Sergio Rodríguez de Guzmán CTO PUE


SAP Predictive Analytics Suite

Cloud Based Analytics for SAP

Data Analytics with MATLAB Adam Filion Application Engineer MathWorks

Building a Data Lake with Spark and Cassandra Brendon Smith & Mayur Ladwa

Real World Use Cases: Hadoop & NoSQL in Production. Big Data Everywhere London 4 June 2015

Data Analytics for Semiconductor Manufacturing The MathWorks, Inc. 1

KnowledgeENTERPRISE FAST TRACK YOUR ACCESS TO BIG DATA WITH ANGOSS ADVANCED ANALYTICS ON SPARK. Advanced Analytics on Spark BROCHURE

Guide to Modernize Your Enterprise Data Warehouse How to Migrate to a Hadoop-based Big Data Lake

Oracle Big Data Cloud Service

Pentaho 8.0 Overview. Pedro Alves

Setup HSP Ambari Cluster

IBM SPSS & Apache Spark

The Intersection of Big Data and DB2

Education Course Catalog Accelerate your success with the latest training in enterprise analytics, mobility, and identity intelligence.

Integrating MATLAB Analytics into Enterprise Applications

Common Customer Use Cases in FSI

Sunnie Chung. Cleveland State University

Big Data Trends Arató Bence. BI Consulting

Brian Macdonald Big Data & Analytics Specialist - Oracle

KnowledgeSTUDIO. Advanced Modeling for Better Decisions. Data Preparation, Data Profiling and Exploration

Pentaho 8.0 and Beyond. Matt Howard Pentaho Sr. Director of Product Management, Hitachi Vantara

NFLABS SIMPLIFYING BIG DATA. Real &me, interac&ve data analy&cs pla4orm for Hadoop

Designing Business Intelligence Solutions with Microsoft SQL Server 2014

Nouvelle Génération de l infrastructure Data Warehouse et d Analyses

Microsoft Big Data. Solution Brief

Adobe Deploys Hadoop as a Service on VMware vsphere

The Alpine Data Platform

Berkeley Data Analytics Stack (BDAS) Overview

HP Network Automation 7.2 Fundamentals and Administration

Case Study Qubole. Marc Rosen, Sr. Director of Data Analytics

HADOOP OPERATIONS PDF

ETL challenges on IOT projects. Pedro Martins Head of Implementation

MapR Pentaho Business Solutions

RDMA Hadoop, Spark, and HBase middleware on the XSEDE Comet HPC resource.

Reduce Money Laundering Risks with Rapid, Predictive Insights

Accelerating Your Big Data Analytics. Jeff Healey, Director Product Marketing, HPE Vertica

Building a Robust Analytics Platform

TechValidate Survey Report. Converged Data Platform Key to Competitive Advantage

MapR: Converged Data Pla3orm and Quick Start Solu;ons. Robin Fong Regional Director South East Asia

Cloud Assisted Trend Analysis of Twitter Data using Hadoop

A REVIEW ON HADOOP ARCHITECTURE FOR BIG DATA

Transcription:

Hadoop Course Content Hadoop Course Content Hadoop Overview, Architecture Considerations, Infrastructure, Platforms and Automation Use case walkthrough ETL Log Analytics Real Time Analytics Hbase for Developers : NoSQL Introduction Traditional RDBMS approach NoSQL introduction Hadoop & Hbase positioning Hbase Introduction What it is, what it is not, its history and common use-cases Hbase Client Shell, exercise Hbase Architecture Building Components Storage, B+ tree, Log Structured Merge Trees Region Lifecycle Read/Write Path

Hbase Schema Design Introduction to hbase schema Column Family, Rows, Cells, Cell timestamp Deletes Exercise - build a schema, load data, query data Hbase Java API Exercises Connection CRUD API Scan API Filters Counters Hbase MapReduce Hbase Bulk load Hbase Operations, cluster management Performance Tuning Advanced Features Exercise Recap and Q&A MapReduce for Developers Introduction Traditional Systems / Why Big Data / Why Hadoop Hadoop Basic Concepts/Fundamentals Hadoop in the Enterprise Where Hadoop Fits in the Enterprise Review Use Cases Architecture Hadoop Architecture & Building Blocks HDFS and MapReduce Hadoop CLI Walkthrough Exercise

MapReduce Programming Fundamentals Anatomy of MapReduce Job Run Job Monitoring, Scheduling Sample Code Walk Through Hadoop API Walk Through Exercise MapReduce Formats Input Formats, Exercise Output Formats, Exercise Hadoop File Formats MapReduce Design Considerations Hadoop File Formats MapReduce Algorithms Walkthrough of 2-3 Algorithms MapReduce Features Counters, Exercise Map Side Join, Exercise Reduce Side Join, Exercise Sorting, Exercise Use Case A (Long Exercise) Input Formats, Exercise Output Formats, Exercise MapReduce Testing Hadoop Ecosystem Oozie Flume Sqoop Exercise 1 (Sqoop) Streaming API Exercise 2 (Streaming API)

Hcatalog Zookeeper HBase Introduction Introduction HBase Architecture VIEW Types Default Views Overriden Views Normal Views MapReduce Performance Tuning Development Best Practice and Debugging Apache Hadoop for Administrators Hadoop Fundamentals and Architecture Why Hadoop, Hadoop Basics and Hadoop Architecture HDFS and Map Reduce Hadoop Ecosystems Overview Hive Hbase ZooKeeper Pig Mahout Flume Sqoop Oozie Hardware and Software requirements Hardware, Operating System and Other Software Management Console

Deploy Hadoop ecosystem services Hive ZooKeeper HBase Administration Pig Mahout Mysql Setup Security Enable Security Configure Users, Groups, Secure HDFS, MapReduce, HBase and Hive Configuring User and Groups Configuring Secure HDFS Configuring Secure MapReduce Configuring Secure HBase and Hive Manage and Monitor your cluster Command Line Interface Troubleshooting your cluster Introduction to Big Data and Hadoop Hadoop Overview Why Hadoop Hadoop Basic Concepts Hadoop Ecosystem MapReduce, Hadoop Streaming, Hive, Pig, Flume, Sqoop, Hbase, Oozie, Mahout Where Hadoop fits in the Enterprise Review use cases Apache Hive & Pig for Developers Overview of Hadoop Why Hadoop Hadoop Basic Concepts Hadoop Ecosystem MapReduce, Hadoop Streaming, Hive, Pig, Flume, Sqoop, Hbase, Oozie, Mahout

Where Hadoop fits in the Enterprise Overview of Hadoop Big Data and the Distributed File System MapReduce Hive Introduction Why Hive? Compare vs SQL Use Cases Hive Architecture Building Blocks Hive CLI and Language (Exercise) HDFS Shell Hive CLI Data Types Hive Cheat-Sheet Data Definition Statements Data Manipulation Statements Select, Views, GroupBy, SortBy/DistributeBy/ClusterBy/OrderBy, Joins Built-in Functions Union, Sub Queries, Sampling, Explain Hive Architecture Building Blocks Hive CLI and Language (Exercise) HDFS Shell Hive CLI Data Types Hive Cheat-Sheet Data Definition Statements Data Manipulation Statements Select, Views, GroupBy, SortBy/DistributeBy/ClusterBy/OrderBy, Joins Built-in Functions Union, Sub Queries, Sampling, Explain

Hive Architecture Building Blocks Hive CLI and Language (Exercise) HDFS Shell Hive CLI Data Types Hive Cheat-Sheet Data Definition Statements Data Manipulation Statements Select, Views, GroupBy, SortBy/DistributeBy/ClusterBy/OrderBy, Joins Built-in Functions Union, Sub Queries, Sampling, Explain Hive Usecase implementation -(Exercise) Use Case 1 Use Case 2 Best Practices Advance Features Transform and Map-Reduce Scripts Custom UDF UDTF SerDe Recap and Q&A Pig Introduction Position Pig in Hadoop ecosystem Why Pig and not MapReduce Simple example (slides) comparing Pig and MapReduce Who is using Pig now and what are the main use cases Pig Architecture Discuss high level components of Pig Pig Grunt - How to Start and Use Pig Latin Programming Data Types Cheat sheet Schema

Expressions Commands and Exercise Load, Store, Dump, Relational Operations,Foreach, Filter, Group, Order By, Distinct, Join, Cogroup,Union, Cross, Limit, Sample, Parallel Use Cases (working exercise) Use Case 1 Use Case 2 Use Case 3 (compare pig and hive) Advanced Features, UDFs Best Practices and common pitfalls Mahout & Machine Learning Mahout Overview Mahout Installation Introduction to the Math Library Vector implementation and Operations (Hands-on exercise) Matrix Implementation and Operations (Hands-on exercise) Anatomy of a Machine Learning Application Classification Introduction to Classification Classification Workflow Feature Extraction Classification Techniques (Hands-on exercise) Evaluation (Hands-on exercise) Clustering Use Cases Clustering algorithms in Mahout K-means clustering (Hands-on exercise) Canopy clustering (Hands-on exercise) Clustering Mixture Models Probabilistic Clustering Dirichlet (Hands-on exercise) Latent Dirichlet Model (Hands-on exercise)

Evaluating and Improving Clustering quality (Hands-on exercise) Distance Measures (Hands-on exercise) Recommendation Systems Overview of Recommendation Systems Use cases Types of Recommendation Systems Collaborative Filtering (Hands-on exercise) Recommendation System Evaluation (Hands-on exercise) Similarity Measures Architecture of Recommendation Systems Wrap Up