BIG DATA ANALYTICS WITH HADOOP. 40 Hour Course

Size: px
Start display at page:

Download "BIG DATA ANALYTICS WITH HADOOP. 40 Hour Course"

Transcription

1 1 BIG DATA ANALYTICS WITH HADOOP 40 Hour Course

2 OVERVIEW Learning Objectives Understanding Big Data Understanding various types of data that can be stored in Hadoop Setting up and Configuring Hadoop in Pseduo Distributed Mode Distributed Mode Understanding how Big Data & Hadoop fit in the current environment and infrastructure 2

3 Work with Map Reduce programs Code the Ecosystem projects Performing Data Analytics using PIG & HIVE Understand and work on real time use cases Implementing a Hadoop project Working on live/real life project on big data analytics using Hadoop eco-system And Much More.. 3

4 COURSE HIGHLIGHTS. Detailed explanation of every topic with real world example. Examples include both Industry based as well as based on day to day activities. Labs, Assignments and Test for every topic. Labs would be performed on Virtual Machine which will be shared with you. Project A real use case based project will be assigned with proper documentation and all required files. 4

5 COURSE FEES Simple fee structure. No hidden costs. 25,000 INR 20,000 INR 5

6 Support Anytime during the Course duration and even after the course completion, you can post your queries through mail on and we will get back to you within 24 hours 6

7 CONTENT PROVIDED A Hadoop Reference guide having all the topics covered in the sessions. Lab Manual with all Labs covered. Exercise Manual with some assignments. Question Bank. Government Recognised Certificate 7

8 8 MODULE 1 Introduction to Big Data

9 INTRODUCTION TO BIG DATA ANALYTICS Big Data what? It s characteristics. Some facts and figures Importance of Big Data Need of understanding and analyzing Big Data. Basics of Data Analytics Problems with existing systems. 9

10 10 MODULE 2 Introduction to Hadoop

11 INTRODUCTION TO HADOOP What is Hadoop Architecture Hadoop Job Process File Anatomy Read Operations Write Operations Useful Configurations core-site.xml hdfs-site.xml mapred-site.xml 11

12 HDFS ( HADOOP DISTRIBUTED FILE SYSTEM ) Significance of HDFS in Hadoop HDFS Features Daemons of Hadoop and functionalities NameNode DataNode JobTracker TaskTracker Secondary NameNode 12

13 Data Storage in HDFS Blocks Heartbeats Data Replication Accessing HDFS CLI (Command Line Interface) Unix and Hadoop Commands Java Based Approach 13

14 MAP REDUCE Introduction to MapReduce MapReduce Architecture MapReduce Programming Model MapReduce Algorithm and Phases Basic MapReduce Program Driver Code Mapper Code Reducer Code 14

15 LABS Configuring a pseudo distributed Hadoop Cluster. Working with HDFS command line options. Running a Word Count program. 15

16 16 MODULE 3 Hadoop Ecosystem

17 HADOOP ECOSYSTEM Hadoop Ecosystem What is ecosystem Different ecosystem projects Sqoop Hive Pig Flume Ambari Hue 17

18 Revision Test 25 questions 20 minutes 18

19 LABS Import data using Sqoop and query it using Hive. Configuring a Flume agent. Mini Project Using all Ecosystem projects on one sample weblogs data set 19

20 20 MODULE 4 Brief Walk with HDFS and MapReduce

21 DEEPER DIVE Advanced HDFS Secondary NameNode Federation High Availability Advanced MapReduce Demo of Precedence levels Partioners Combiners 21

22 22 MODULE 5 Hadoop Administration

23 HADOOP ADMINISTRATION Cluster Planning Understanding hardware and software requirements of a Hadoop cluster Different modes of operation of Hadoop Precedence levels Some dos and don ts 23

24 LABS Hive Creating different types of tables Executing different queries Pig Working with different data types Modes of execution Running some pig application driven commands Sqoop Import Export 24

25 25 MODULE 6 Data Visualization

26 DATA VISUALIZATION Working with Hadoop ODBC Connector. Data Visualization using Excel. Exporting Hive data Creating graphs and interactive charts for your hive data. Analysing hive data using power view in excel. 26

27 27 MODULE 7 Use Cases Project Work

28 PROJECT Twitter Use Case Why do companies use twitter data How to analyze twitter data How to Create Twitter API Load Some tweets into HDFS Query twitter logs. Revision Test 50 questions 30 minutes 28

29 29 HOPE TO SEE ON-BOARD SOON.