Preferred method of contact:

Hadoop Architecture & Administration for Big Data Solutions



Course Number



4 Days

PDF Add to WishList

The emergence of large data sets presents new opportunities and challenges to organizations of all sizes. In this Hadoop architecture and administration training course, you gain the skills to instal, configure, and manage the Apache Hadoop platform and its associated ecosystem, and build a Hadoop solution that satisfies your business requirements.

You Will Learn How To

  • Architect a Hadoop solution to satisfy your business requirements
  • Instal and build a Hadoop cluster capable of processing large data
  • Configure and tune the Hadoop environment to ensure high throughput and availability
  • Allocate, distribute, and manage resources
  • Monitor the file system, job progress, and overall cluster performance

Important Course Information

  • Recommended Experience

    • Knowledge of Linux at the level of:
    • Knowledge of Java at the level of:

Course Outline

  • Introduction to Data Storage and Processing

Installing the Hadoop Distributed File System (HDFS)

  • Defining key design assumptions and architecture
  • Configuring and setting up the file system
  • Issuing commands from the console
  • Reading and writing files

Setting the stage for MapReduce

  • Reviewing the MapReduce approach
  • Introducing the computing daemons
  • Dissecting a MapReduce job
  • Defining Hadoop Cluster Requirements

Planning the architecture

  • Selecting appropriate hardware
  • Designing a scalable cluster

Building the cluster

  • Installing Hadoop daemons
  • Optimizing the network architecture
  • Configuring a Cluster

Preparing HDFS

  • Setting basic configuration parameters
  • Configuring block allocation, redundancy and replication

Deploying MapReduce

  • Installing and setting up the MapReduce environment
  • Delivering redundant load balancing via Rack Awareness
  • Maximizing HDFS Robustness

Creating a fault–tolerant file system

  • Isolating single points of failure
  • Maintaining High Availability
  • Triggering manual failover
  • Automating failover with Zookeeper

Leveraging NameNode Federation

  • Extending HDFS resources
  • Managing the namespace volumes

Introducing YARN

  • Critiquing the YARN architecture
  • Identifying the new daemons
  • Managing Resources and Cluster Health

Allocating resources

  • Setting quotas to constrain HDFS utilization
  • Prioritizing access to MapReduce using schedulers

Maintaining HDFS

  • Starting and stopping Hadoop daemons
  • Monitoring HDFS status
  • Adding and removing data nodes

Administering MapReduce

  • Managing MapReduce jobs
  • Tracking progress with monitoring tools
  • Commissioning and decommissioning compute nodes
  • Maintaining a Cluster

Employing the standard built–in tools

  • Managing and debugging processes using JVM metrics
  • Performing Hadoop status checks

Tuning with supplementary tools

  • Assessing performance with Ganglia
  • Benchmarking to ensure continued performance
  • Extending Hadoop

Simplifying information access

  • Enabling SQL–like querying with Hive
  • Installing Pig to create MapReduce jobs

Integrating additional elements of the ecosystem

  • Imposing a tabular view on HDFS with HBase
  • Configuring Oozie to schedule workflows
  • Implementing Data Ingress and Egress

Facilitating generic input/output

  • Moving bulk data into and out of Hadoop
  • Transmitting HDFS data over HTTP with WebHDFS

Acquiring application–specific data

  • Collecting multi–sourced log files with Flume
  • Importing and exporting relational information with Sqoop
  • Planning for Backup, Recovery and Security
  • Coping with inevitable hardware failures
  • Securing your Hadoop cluster
Show complete outline
Show Less

Convenient Ways to Attend This Instructor-Led Course

Hassle-Free Enrolment: No advance payment required to reserve your seat.
Tuition due 30 days after you attend your course.

In the Classroom

Live, Online

Private Team Training

In the Classroom — OR — Live, Online

Tuition — Standard: $3285   Government: $2890

Sep 3 - 6 ( 4 Days)
9:00 AM - 4:30 PM EDT
New York / Online (AnyWare) New York / Online (AnyWare) Reserve Your Seat

Oct 22 - 25 ( 4 Days)
9:00 AM - 4:30 PM EDT
Ottawa / Online (AnyWare) Ottawa / Online (AnyWare) Reserve Your Seat

Jan 21 - 24 ( 4 Days)
9:00 AM - 4:30 PM EST
Rockville, MD / Online (AnyWare) Rockville, MD / Online (AnyWare) Reserve Your Seat

Feb 18 - 21 ( 4 Days)
9:00 AM - 4:30 PM EST
New York / Online (AnyWare) New York / Online (AnyWare) Reserve Your Seat

Apr 14 - 17 ( 4 Days)
9:00 AM - 4:30 PM EDT
Ottawa / Online (AnyWare) Ottawa / Online (AnyWare) Reserve Your Seat

Guaranteed to Run

When you see the "Guaranteed to Run" icon next to a course event, you can rest assured that your course event — date, time, location — will run. Guaranteed.

Private Team Training

Enrolling at least 3 people in this course? Consider bringing this (or any course that can be custom designed) to your preferred location as a private team training.

For details, call 1-888-843-8733 or Click here »

This event has been added to your cart.




In Classroom or





Private Team Training

Contact Us »

Course Tuition Includes:

After-Course Instructor Coaching
When you return to work, you are entitled to schedule a free coaching session with your instructor for help and guidance as you apply your new skills.

After-Course Computing Sandbox
You'll be given remote access to a preconfigured virtual machine for you to redo your hands-on exercises, develop/test new code, and experiment with the same software used in your course.

Free Course Exam
You can take your Learning Tree course exam on the last day of your course or online any time after class.


Training Hours

Standard Course Hours: 9:00 am – 4:30 pm
*Informal discussion with instructor about your projects or areas of special interest: 4:30 pm – 5:30 pm

- ,

Chat Now

Please Choose a Language

Canada - English

Canada - Français