Preferred method of contact:

Hadoop Programming with Java for Big Data Solutions



Course Number



4 Days

PDF Add to WishList

The availability of large data sets presents new opportunities and challenges to organizations of all sizes. In this course, you will implement a strategy for developing Hadoop jobs and extracting business value from large and varied data sets. This Apache Hadoop development training is essential for programmers who want to augment their programming skills to use Hadoop for a variety of big data solutions.

You Will Learn How To

  • Write, customize, and deploy Java MapReduce jobs to summarize data
  • Develop Hive and Pig queries to simplify data analysis
  • Test and debug jobs using MRUnit
  • Monitor task execution and cluster health

Important Course Information

  • Requirements

    • Java experience at the level of:
      • Course 471, Java Programming Introduction, or at least six months of Java programming experience

Course Outline

  • Introduction to Hadoop
  • Identifying the business benefits of Hadoop
  • Surveying the Hadoop ecosystem
  • Selecting a suitable distribution
  • Parallelizing Program Execution

Meeting the challenges of parallel programming

  • Investigating parallelisable challenges: algorithms, data and information exchange
  • Estimating the storage and complexity of Big Data

Parallel programming with MapReduce

  • Dividing and conquering large-scale problems
  • Uncovering jobs suitable for MapReduce
  • Solving typical business problems
  • Implementing Real-World MapReduce Jobs

Applying the Hadoop MapReduce paradigm

  • Configuring the development environment
  • Exploring the Hadoop distribution
  • Creating the components of MapReduce jobs
  • Introducing the Hadoop daemons
  • Analyzing the stages of MapReduce processing: splitting, mapping, shuffling and reducing

Building complex MapReduce jobs

  • Selecting and employing multiple mappers and reducers
  • Leveraging built-in mappers, reducers and partitioners
  • Analyzing time series data with secondary sort
  • Streaming tasks through various programming languages
  • Customizing MapReduce

Solving common data manipulation problems

  • Executing algorithms: parallel sorts, joins and searches
  • Analyzing log files, social media data and e-mails

Implementing partitioners and comparators

  • Identifying network-bound, CPU-bound and disk I/O-bound parallel algorithms
  • Dividing the workload efficiently using partitioners
  • Controlling grouping and sort order with comparators
  • Collecting metrics with counters
  • Persisting Big Data with Distributed Data Stores

Making the case for distributed data

  • Achieving high performance data throughput
  • Recovering from media failure through redundancy

Interfacing with Hadoop Distributed File System (HDFS)

  • Breaking down the structure and organization of HDFS
  • Loading raw data and retrieving results
  • Reading and writing data programmatically
  • Manipulating Hadoop SequenceFile types
  • Sharing reference data with DistributedCache

Structuring data with HBase

  • Migrating from structured to unstructured storage
  • Applying NoSQL concepts with schema on read
  • Connecting to HBase from MapReduce jobs
  • Comparing HBase to other types of NoSQL data stores
  • Simplifying Data Analysis with Query Languages

Unleashing the power of SQL with Hive

  • Structuring databases, tables, views and partitions
  • Integrating MapReduce jobs with Hive queries
  • Querying with HiveQL
  • Accessing Hive servers through JDBC
  • Extending HiveQL with User-Defined Functions (UDF)

Executing workflows with Pig

  • Developing Pig Latin scripts to consolidate workflows
  • Integrating Pig queries with Java
  • Interacting with data through the grunt console
  • Extending Pig with User-Defined Functions (UDF)
  • Managing and Deploying Big Data Solutions

Testing and debugging Hadoop code

  • Logging significant events for auditing and debugging
  • Debugging in local mode
  • Validating requirements with MRUnit

Deploying, monitoring and tuning performance

  • Deploying to a production cluster
  • Optimizing performance with administrative tools
  • Monitoring job execution through web user interfaces
Show complete outline
Show Less

Convenient Ways to Attend This Instructor-Led Course

Hassle-Free Enrolment: No advance payment required to reserve your seat.
Tuition due 30 days after you attend your course.

In the Classroom

Live, Online

Private Team Training

In the Classroom — OR — Live, Online

Tuition — Standard: $3285   Government: $2890

Jan 21 - 24 ( 4 Days)
9:00 AM - 4:30 PM EST
New York / Online (AnyWare) New York / Online (AnyWare) Reserve Your Seat

Feb 18 - 21 ( 4 Days)
9:00 AM - 4:30 PM EST
Herndon, VA / Online (AnyWare) Herndon, VA / Online (AnyWare) Reserve Your Seat

Guaranteed to Run

When you see the "Guaranteed to Run" icon next to a course event, you can rest assured that your course event — date, time, location — will run. Guaranteed.

Private Team Training

Enrolling at least 3 people in this course? Consider bringing this (or any course that can be custom designed) to your preferred location as a private team training.

For details, call 1-888-843-8733 or Click here »

This event has been added to your cart.




In Classroom or





Private Team Training

Contact Us »

Course Tuition Includes:

After-Course Instructor Coaching
When you return to work, you are entitled to schedule a free coaching session with your instructor for help and guidance as you apply your new skills.

After-Course Computing Sandbox
You'll be given remote access to a preconfigured virtual machine for you to redo your hands-on exercises, develop/test new code, and experiment with the same software used in your course.

Free Course Exam
You can take your Learning Tree course exam on the last day of your course or online any time after class.


Training Hours

Standard Course Hours: 9:00 am – 4:30 pm
*Informal discussion with instructor about your projects or areas of special interest: 4:30 pm – 5:30 pm

- ,

Chat Now

Please Choose a Language

Canada - English

Canada - Français