ABOUT THIS TRAINING: This Hadoop training will also prepare you for the Big Data Certification of Cloudera- CCP and CCA.

Similar documents
BIG DATA AND HADOOP DEVELOPER

Hadoop Course Content

Big Data & Hadoop Advance

Big Data Application Engineer/ Developer. Specialization in Apache Spark, Kafka, Airflow, HBase

BIG DATA ANALYTICS WITH HADOOP. 40 Hour Course

20775A: Performing Data Engineering on Microsoft HD Insight

20775 Performing Data Engineering on Microsoft HD Insight

Course Content. The main purpose of the course is to give students the ability plan and implement big data workflows on HDInsight.

20775: Performing Data Engineering on Microsoft HD Insight

20775A: Performing Data Engineering on Microsoft HD Insight

Introduction to Big Data(Hadoop) Eco-System The Modern Data Platform for Innovation and Business Transformation

BIG DATA PROCESSING A DEEP DIVE IN HADOOP/SPARK & AZURE SQL DW

BIG DATA and DATA SCIENCE

Exploring Big Data and Data Analytics with Hadoop and IDOL. Brochure. You are experiencing transformational changes in the computing arena.

Post Graduate Program in BIG DATA ENGINEERING. In association with 11 MONTHS ONLINE

Hadoop Administration Course Content

Preface About the Book

Outline of Hadoop. Background, Core Services, and Components. David Schwab Synchronic Analytics Nov.

MapR: Solution for Customer Production Success

Transforming Analytics with Cloudera Data Science WorkBench

Big Data Hadoop Developer Training

Intro to Big Data and Hadoop

5th Annual. Cloudera, Inc. All rights reserved.

E-guide Hadoop Big Data Platforms Buyer s Guide part 1

1Week. Big Data & Hadoop. Why big data & Hadoop is important? National Winter Training program on

Big Data Foundation. 2 Days Classroom Training PHILIPPINES :: MALAYSIA :: VIETNAM :: SINGAPORE :: INDIA

Modernizing Your Data Warehouse with Azure

Big Data. By Michael Covert. April 2012

Hadoop In Practice: Includes 85 Techniques By Alex Holmes READ ONLINE

Big Data Hadoop Administrator Training

Cloudera, Inc. All rights reserved.

Big Data Hadoop Administrator.

Spark and Hadoop Perfect Together

HADOOP ADMINISTRATION

COMP9321 Web Application Engineering

Analytics Platform System

APAC Big Data & Cloud Summit 2013

Deloitte School of Analytics. Demystifying Data Science: Leveraging this phenomenon to drive your organisation forward

GET MORE VALUE OUT OF BIG DATA

Simplifying the Process of Uploading and Extracting Data from Apache Hadoop

Leveraging Oracle Big Data Discovery to Master CERN s Data. Manuel Martín Márquez Oracle Business Analytics Innovation 12 October- Stockholm, Sweden

1. Intoduction to Hadoop

Redefine Big Data: EMC Data Lake in Action. Andrea Prosperi Systems Engineer

SAS and Hadoop Technology: Overview

Data Analytics and CERN IT Hadoop Service. CERN openlab Technical Workshop CERN, December 2016 Luca Canali, IT-DB

Sunnie Chung. Cleveland State University

Why Big Data Matters? Speaker: Paras Doshi

Hadoop and Analytics at CERN IT CERN IT-DB

Powered by Tech Mahindra MAKE IT BIG WITH BIG DATA ANALYTICS

Powered by. Tech Mahindra MAKE IT BIG WITH BIG DATA ANALYTICS

Contents at a Glance COPYRIGHTED MATERIAL. Introduction... 1 Part I: Getting Started with Big Data... 7

Achieving Agility and Flexibility in Big Data Analytics with the Urika -GX Agile Analytics Platform

Powered by. Tech Mahindra MAKE IT BIG WITH BIG DATA ANALYTICS

Cloudera Data Science and Machine Learning. Robin Harrison, Account Executive David Kemp, Systems Engineer. Cloudera, Inc. All rights reserved.

EXECUTIVE BRIEF. Successful Data Warehouse Approaches to Meet Today s Analytics Demands. In this Paper

Model Curriculum JUNIOR DATA ASSOCIATE SECTOR: SUB-SECTOR: OCCUPATION: REFERENCE ID: NSQF LEVEL:

E-guide Hadoop Big Data Platforms Buyer s Guide part 3

Cask Data Application Platform (CDAP) The Integrated Platform for Developers and Organizations to Build, Deploy, and Manage Data Applications

Analytics in the Cloud, Cross Functional Teams, and Apache Hadoop is not a Thing Ryan Packer, Bank of New Zealand

Apache Spark and R A (big data) love story?

Hadoop Integration Deep Dive

ENABLING GLOBAL HADOOP WITH DELL EMC S ELASTIC CLOUD STORAGE (ECS)

AZURE HDINSIGHT. Azure Machine Learning Track Marek Chmel

COPYRIGHTED MATERIAL. 1Big Data and the Hadoop Ecosystem

Pentaho Technical Overview. Max Felber Solution Engineer September 22, 2016

ORACLE DATA INTEGRATOR ENTERPRISE EDITION

R and Hadoop. Ram Venkat Dawn Analytics

Big Data Job Descriptions. Software Engineer - Algorithms

Digital Transformation 2.0

Realising Value from Data

IBM Big Data Summit 2012

Job Interview Questions Series. Hadoop BIG DATA. Interview Questions You'll Most Likely Be Asked. Interview Questions

ETL on Hadoop What is Required

Building a Data Lake with Spark and Cassandra Brendon Smith & Mayur Ladwa

New Big Data Solutions and Opportunities for DB Workloads

Apache Spark 2.0 GA. The General Engine for Modern Analytic Use Cases. Cloudera, Inc. All rights reserved.

Hadoop in Production. Charles Zedlewski, VP, Product

Cask Data Application Platform (CDAP)

Zeal Education Society s Zeal Institute of Business Administration, Computer Application & Research

Leveraging Predictive Tools to Decrease Resolution Time

SAP Machine Learning for Hadoop. Customer

Spark, Hadoop, and Friends

Building Your Big Data Team

Hortonworks Data Platform

Charter Global. Digital Solutions and Consulting Services. Digital Solutions. QA Testing

Berkeley Data Analytics Stack (BDAS) Overview

BIG WITH BIG DATA ANALYTICS

BIG WITH BIG DATA ANALYTICS

Powered by Tech Mahindra MAKE IT BIG WITH BIG DATA ANALYTICS

Course 20467C: Designing Self-Service Business Intelligence and Big Data Solutions

Cloudera Enterprise Data Hub Reference Architecture for Oracle Cloud Infrastructure Deployments O R A C L E W H I T E P A P E R J U N E

BIG WITH BIG DATA ANALYTICS

TechArch Day Digital Decoupling. Oscar Renalias. Accenture

Digging into Hadoop-based Big Data Architectures

Databricks Cloud. A Primer

HADOOP DEVELOPMENT PROSPECTUS HADOOP DEVELOPMENT UNIVERSITY OF SKILLS

Big data is hard. Top 3 Challenges To Adopting Big Data

Architecture Optimization for the new Data Warehouse. Cloudera, Inc. All rights reserved.

Transcription:

ABOUT THIS TRAINING: The world of Hadoop and Big Data" can be intimidating - hundreds of different technologies with cryptic names form the Hadoop ecosystem. This comprehensive training has been designed by industry experts considering current industry job requirements to provide in-depth learning on big data and Hadoop Modules. This industry oriented program is a combination of the training courses in Hadoop developer, Hadoop administrator, Hadoop testing, and analytics. This Hadoop training will also prepare you for the Big Data Certification of Cloudera- CCP and CCA. With this course, you'll not only understand what those systems are and how they fit together - but you'll go hands-on and learn how to use them to solve real business problems! - So it's not just theory It's filled with hands-on activities and exercises backed with plenty discussions of use cases of different clients across domains. We believe this will help you to build your confidence on the technology and its usage. You'll find a range of activities in this course for people at every level. If you're a project manager who just wants to learn the buzzwords, there are web UI's for many of the activities in the course that require no programming knowledge. If you're comfortable with command lines,

we'll show you how to work with them too. And if you're a programmer, we'll challenge you with writing real scripts on a Hadoop system using java, Scala, Pig Latin, and Python. YOUR TAKEAWAY FROM TRAINING: You'll walk away from this course with a real, deep understanding of Hadoop and its associated distributed systems, and you can apply Hadoop to real-world problems. Plus a valuable completion certificate is waiting for you at the end! WHAT YOU WILL LEARN IN THIS BIG DATA HADOOP ONLINE TRAINING COURSE? 1. Detailed understanding of Big Data analytics 2. Master fundamentals of Hadoop 2.8 and YARN for designing distributed systems that manages "big data" using Hadoop and related technologies for storing and analyzing data at scale. 3. Understand the architecture of HDFS and MapReduce for parallel storage and parallel processing. 4. Understand the configuration choices you should make for stability, reliability and optimized task scheduling on your distributed system. 5. Analyze relational data using Hive and MySQL(Connecting Hadoop to Other DBs) 6. Analyze non-relational data using HBase 7. Use Pig and Spark to create scripts to process data on a Hadoop cluster in more complex ways. 8. Understand how Hadoop clusters are managed by YARN, Zookeeper and Hue. 9. Know how to schedule your Hadoop jobs using Oozie. 10. Collect data from a variety of sources to your Hadoop cluster using Sqoop and Flume 11. Master Hadoop administration activities like cluster managing, monitoring, administration and troubleshooting 12. Learn testing Hadoop applications using MR Unit and other automation tools. 13. Know basics of Spark, Spark RDD, Graphx, MLlib and writing Spark applications 14. Practice real-life projects using Hadoop and Apache Spark 15. Discussion on industry use cases of different clients across domains 16. Be equipped to clear Big Data Hadoop Certification. (CCP and CCA)

RECOMMENDED SKILLS PRIOR TO TAKING THIS COURSE There is no pre-requisite to take this Big data training and to master Hadoop. But basics of UNIX, SQL and java/python would be good. At GraduIT, we provide complimentary Self-Paced unix and Java course with our Big Data Hadoop training to brush-up the required skills so that you are good on your Hadoop learning path. WHO SHOULD TAKE THIS BIG DATA HADOOP TRAINING COURSE? 1. Programming Developers and System Administrators 2. Experienced working professionals, Project managers 3. Big Data Hadoop Developers eager to learn other verticals like Testing, Analytics, Administration 4. Business Intelligence, Data warehousing and Analytics Professionals 5. Graduates, undergraduates eager to learn the latest Big Data technology can take this Big Data Hadoop Certification online training BIT ON HADOOP: Hadoop enables to BUILD AN INSIGHT-DRIVEN BUSINESS. To be specific, Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware. It provides massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs. Almost every large company you might want to work at uses Hadoop in some way, including Amazon, Ebay, Facebook, Google, LinkedIn, IBM, Spotify, Twitter, and Yahoo! And it's not just technology companies that need Hadoop; even the New York Times uses Hadoop for processing images.

CURRICULUM: Module 1: Understanding Big Data and Hadoop Introduction to Big Data And Hadoop Discussion on Big Data and its Sources and Challenges related to it Understanding attributes of Big Data and different data varieties Discussing Uses Cases on Opportunity for Business in Big Data Discussion on different solutions for problems related to Big Data Comparison of Hadoop vs traditional systems Solutions Understanding Complete Solution architecture from Data Acquisition to Data Analysis for Business Discussion on technologies getting used for End to End solution for Big Data Analysis driven business Session on PYTHON, UNIX and JAVA(Self-paced learning videos) Module 2: Understanding Hadoop from Thousand feet View Bit on history of Hadoop and its evolution Understanding parallel processing and parallel storage architecture Relating MAP REDUCE and HDFS Architecture to above Overview of all technology stacks related to Hadoop Ecosystem. Discussion on Different Distributions of Hadoop, Different vendors of Hadoop Installation and set-up of Hadoop Cluster (Cloudera preferably) Module 3: Understanding Parallel Storage solution with HDFS Understanding HDFS Architecture in detail Understanding Hadoop Master-Slave Architecture Understanding Name Node, Data Node, Secondary Name Node Discussion on Blocks and Data Replication Learning common HDFS commands and practicing them Discussion on Typical Production Cluster and its configurations for stability, reliability and Optimization Understanding Anatomy of file Read, Write operations in HDFS Discussion on Hadoop 2.x Cluster Architecture - Federation and High Availability

Module 4: Understanding Parallel Process solution with MapReduce Understanding Hadoop 2.x MapReduce Architecture (YARN Architecture) Understanding Hadoop 2.x MapReduce Components Discussion on YARN MR Application Execution Flow Discussion on Anatomy of MapReduce Program Work Flow Discussion on basic MapReduce API Concepts Writing MapReduce Driver, Mappers, and Reducers using JAVA Demo on MapReduce program execution Understanding Input Splits and its relationship with HDFS Blocks Discuss on Advanced Concepts: o Distributed Cache o Combiner and Partitioner o Counters o Map side and Reduce side Joins o Use of Compression techniques (Snappy, LZO and Zip) o Advanced Data types in Map Reduce(Writable and WritableComparable) Hands-on exercises on Map Reduce Program Execution Demo on Telecom Data set Module 5: Learn to analyze relational data using Hive Understanding of Hive Framework & Components Discussion on relationship between Hive and MapReduce (When to choose What) Understanding and hands on o Hive Data Types o Hive DDL Create/Show/Drop Data Base and Tables o Hive DML Load Files & Insert Data o Hive SQL - Select, Filter, Join, Group By Understanding of Internal and External Tables Understanding of Programming structure in UDF, Partitions and Buckets Discussion on Limitations of Hive. Discussion on HIVE on SPARK. Module 6: Learn Data Flow ETL Scripting Language Pig Introduction to Apache Pig Discussion on relationship between Hive,MapReduce and Pig Grunt Shell and Utility components Different data types in Pig

Programming Structure in Pig Modes Of Execution in Pig Experiencing Pig Script by writing Evaluation, Filter, Load and Store functions UDFs in Pig Understand Integration of HBASE with Pig (After Completion of HBASE Module) Module 7: Learn to analyze non-relational data using HBase Introduction to NoSQL Databases Understanding HBase non-relational distributed Data base and its architecture When/Why to use HBase Understanding HBase Client API Know how to load Data into HBase Know how to query Data from HBase Discussion on relationship between Hive,MapReduce,Pig and HBase Overview of other non-relational like Cassandra, and MongoDB and its advantages over RDBMS and Career path in NO SQL DBs. Module 8: Learn how to Collect data from a variety of sources to your Hadoop cluster using Sqoop, Flume Understand Why and what is SQOOP Understand SQOOP Architecture Learn how to Importing Data Using SQOOP to HDFS/HIVE/HBase From RDBMS Understand Why and what is Flume Understand Flume Architecture Learn how to efficiently collect, aggregate, and move large amounts of streaming data into the Hadoop Distributed File System (HDFS) Module 9: Learn how to schedule Hadoop jobs using Oozie Introduction to Work Flow Management and Oozie Learn Oozie Components And Workflow Ozzie Commands Experience scheduling jobs using Oozie Module 10: Discussion on Emerging Technologies in Big Data Landscape Discussion on: o Emerging Trends In Big Data Landscape o Introduction to SPARK In Memory Parallel Processing Framework for Real Time/Near Real Time (NRT) operations

Best practices for Hadoop Developer Discussion on Interview preparation