Modules |
Topic |
Description |
Module 1 |
Introduction and Overview of Hadoop |
- What is Hadoop?
- History of Hadoop
- Building Blocks - Hadoop Eco-System
- Who is behind Hadoop?
- What Hadoop is good for and what it is not
- Parallel Computer vs. Distributed Computing
- How to configure Hadoop on your system
- NameNode architecture (EditLog, FsImage, location of replicas)
- Secondary NameNode architecture
- DataNode architecture
|
Module 2 |
Hadoop Distributed FileSystem (HDFS) |
- HDFS Overview and Architecture
- HDFS Installation
- HDFS Use Cases
- Hadoop FileSystem Shell
- FileSystem Java API
|
Module 3 |
HBase - The Hadoop Database |
- HBase Overview and Architecture
- HBase Installation
- HBase Shell
- Java Client API
- Java Administrative API
- Filters
- Scan Caching and Batching
- Key Design
- Table Design
|
Module 4 |
Map/Reduce 2.0/YARN |
- MapReduce 2.0 and YARN Overview
- MapReduce 2.0 and YARN Architecture
- Installation
- Input and Output Formats
- Job Scheduling (FIFO, Fair Scheduler, Capacity Scheduler)
- HDFS and HBase as Source and Sink
- Job Configuration
- Job Submission and Monitoring
- Anatomy of Job Execution on YARN
- Distributed Cache
- Hadoop Streaming
|
Module 5 |
Hadoop Developer Tasks |
- Writting a map-reduce programme
- Reading and writing data using Java
- Hadoop Eclipse integration
- Mapper in details
- Reducer in details
- Using Combiners
- Reducing Intermediate Data with Combiners
- Writing Partitioners for Better Load Balancing
- Sorting in HDFS
- Searching in HDFS
- Indexing in HDFS
- SHands-On Exercise
|
Module 6 |
Hadoop Administrative Tasks |
- Routine Administrative Procedures
- Understanding dfsadmin and mradmin
- Block Scanner, Balancer
- Health Check & Safe mode
- DataNode commissioning/decommissioning
- Monitoring and Debugging on a production cluster
- NameNode Back up and Recovery
- Upgrading Hadoop
|
Module 7 |
MapReduce Workflows |
- Decomposing Problems into MapReduce Workflow
- Using JobControl
- Oozie Introduction and Architecture
- Oozie Installation
- Developing, deploying, and Executing Oozie Workflows
|
Module 8 |
Pig |
- Pig Overview
- Installation
- Pig Latin
- Developing Pig Scripts
- Processing Big Data with Pig
- Joining data-sets with Pig
|
Module 9 |
Inheritance |
- Types of in inheritance
- Advantage of inheritance
- Single inheritance
- Multilevel inheritance
- Hierarchical inheritance
- Overriding methods
- Runtime polymorphism
|
Module 10 |
Hive |
- Hive Overview
- Installation
- Hive QL
|