Preferred method of contact:

Apache Spark Programming with Scala for Big Data Solutions



Course Number



4 Days

PDF Add to WishList

Supercharge your data with Apache Spark, a big data platform well-suited for iterative algorithms required by graph analytics and machine learning. In this training course, you will learn to leverage Spark best practices, develop solutions that run on the Apache Spark platform, and take advantage of Spark’s efficient use of memory and powerful programming model.

You Will Learn How To

  • Develop applications with Spark
  • Work with the libraries for SQL, Streaming, and Machine Learning
  • Map real-world problems to parallel algorithms
  • Build business applications that integrate with Spark

Important Course Information

  • Requirements

    • Professional experience in programming at the level of:
    • Three to six months of experience in a object-oriented programming language

Course Outline

  • Introduction to Spark
  • Defining Big Data and Big Computation
  • What is Spark?
  • What are the benefits of Spark?
  • The Challenge of Parallelizing Applications

Scaling-out applications

  • Identifying the performance limitations of a modern CPU
  • Scaling traditional parallel processing models

Designing parallel algorithms

  • Fostering parallelism through functional programming
  • Mapping real-world problems to effective parallel algorithms
  • Defining the Spark Architecture

Parallelizing data structures

  • Partitioning data across the cluster using Resilient Distributed Datasets (RDD) and DataFrames
  • Apportioning task execution across multiple nodes
  • Running applications with the Spark execution model

The anatomy of a Spark cluster

  • Creating resilient and fault-tolerant clusters
  • Achieving scalable distributed storage

Managing the cluster

  • Monitoring and administering Spark applications
  • Visualizing execution plans and results
  • Developing Spark Applications

Selecting the development environment

  • Performing exploratory programming via the Spark shell
  • Building stand-alone Spark applications

Working with the Spark APIs

  • Programming with Scala and other supported languages
  • Building applications with the core APIs
  • Enriching applications with the bundled libraries
  • Manipulating Structured Data with Spark SQL

Querying structured data

  • Processing queries with DataFrames and embedded SQL
  • Extending SQL with User-Defined Functions (UDFs)
  • Exploiting Parquet and JSON formatted data sets

Integrating with external systems

  • Connecting to databases with JDBC
  • Executing Hive queries in external applications
  • Processing Streaming Data in Spark

What is streaming?

  • Implementing sliding window operations
  • Determining state from continuous data
  • Processing simultaneous streams
  • Improving performance and reliability

Streaming data sources

  • Streaming from built-in sources (e.g., log files, Twitter sockets, Kinesis, Kafka)
  • Developing custom receivers
  • Processing with the streaming API and Spark SQL
  • Performing Machine Learning with Spark

Classifying observations

  • Predicting outcomes with supervised learning
  • Building a decision tree classifier

Identifying patterns

  • Grouping data using unsupervised learning
  • Clustering with the k-means method
  • Creating Real-World Applications

Building Spark-based business applications

  • Exposing Spark via a RESTful web service
  • Generating Spark-based dashboards

Spark as a service

  • Cloud vs. on-premises
  • Choosing a service provider (eg, AWS, Azure, Databricks)
  • The Future of Spark
  • Scaling to massive cluster sizes
  • Enhancing security on multi-tenant clusters
  • Tracking the ongoing commercialization of Spark
  • Project Tungsten: pushing performance closer to the limits of modern hardware
  • Working with existing projects powered by Spark
  • Re-architecting Spark for mobile platforms
Show complete outline
Show Less

Convenient Ways to Attend This Instructor-Led Course

Hassle-Free Enrolment: No advance payment required to reserve your seat.
Tuition due 30 days after you attend your course.

In the Classroom

Live, Online

Private Team Training

In the Classroom — OR — Live, Online

Tuition — Standard: $3285   Government: $2890

Sep 3 - 6 ( 4 Days)
9:00 AM - 4:30 PM EDT
New York / Online (AnyWare) New York / Online (AnyWare) Reserve Your Seat

Oct 29 - Nov 1 ( 4 Days)
9:00 AM - 4:30 PM EDT
Greenbelt,MD / Online (AnyWare) Greenbelt,MD / Online (AnyWare) Reserve Your Seat

Jan 7 - 10 ( 4 Days)
9:00 AM - 4:30 PM EST
Herndon, VA / Online (AnyWare) Herndon, VA / Online (AnyWare) Reserve Your Seat

Feb 11 - 14 ( 4 Days)
9:00 AM - 4:30 PM EST
Greenbelt,MD / Online (AnyWare) Greenbelt,MD / Online (AnyWare) Reserve Your Seat

Mar 3 - 6 ( 4 Days)
9:00 AM - 4:30 PM EST
New York / Online (AnyWare) New York / Online (AnyWare) Reserve Your Seat

Apr 28 - May 1 ( 4 Days)
9:00 AM - 4:30 PM EDT
Greenbelt,MD / Online (AnyWare) Greenbelt,MD / Online (AnyWare) Reserve Your Seat

Jun 23 - 26 ( 4 Days)
9:00 AM - 4:30 PM EDT
Herndon, VA / Online (AnyWare) Herndon, VA / Online (AnyWare) Reserve Your Seat

Show all dates
Show fewer dates

Guaranteed to Run

When you see the "Guaranteed to Run" icon next to a course event, you can rest assured that your course event — date, time, location — will run. Guaranteed.

Private Team Training

Enrolling at least 3 people in this course? Consider bringing this (or any course that can be custom designed) to your preferred location as a private team training.

For details, call 1-888-843-8733 or Click here »

This event has been added to your cart.




In Classroom or





Private Team Training

Contact Us »

Course Tuition Includes:

After-Course Instructor Coaching
When you return to work, you are entitled to schedule a free coaching session with your instructor for help and guidance as you apply your new skills.

After-Course Computing Sandbox
You'll be given remote access to a preconfigured virtual machine for you to redo your hands-on exercises, develop/test new code, and experiment with the same software used in your course.

Free Course Exam
You can take your Learning Tree course exam on the last day of your course or online any time after class.


Training Hours

Standard Course Hours: 9:00 am – 4:30 pm
*Informal discussion with instructor about your projects or areas of special interest: 4:30 pm – 5:30 pm

- ,

Chat Now

Please Choose a Language

Canada - English

Canada - Français