Apache Spark Development Training Course

A lightning-fast unified analytics engine for big data and machine learning

16 Sep London
request info

Capita Marks and Spencer Telefonica Cisco BBC Lloyds Sony

Apache Spark Development training course (code: SPARKDEV)


Our Apache Spark training course provides students with a solid technical introduction to the Spark architecture and how Spark works. Attendees learn the basic building blocks of Spark, including RDDs and the distributed compute engine, as well as higher-level constructs that provide a simpler and more capable interface, including Spark SQL and DataFrames.


Python or Java/Scala developers who need to learn about how to develop Big Data and ML solutions with Apache Spark



Module 1 - Introduction to Spark - Getting started

  1. What is Spark and what is its purpose?
  2.  Overview, Motivations, Spark Systems
  3. Spark Ecosystem
  4. Spark vs. Hadoop
  5. Typical Spark Deployment and Usage Environments
  6. Components of the Spark unified stack
  7. Resilient Distributed Dataset (RDD)
  8. Downloading and installing Spark standalone
  9. Python overview
  10. Launching and using the Python shell 

Module 2 - Resilient Distributed Dataset and DataFrames

  1. Understand how to create parallelized collections and external datasets
  2. Work with Resilient Distributed Dataset (RDD) operations
  3. Utilize shared variables and key-value pairs
  4. RDD Concepts, Partitions, Lifecycle, Lazy Evaluation
  5. Working with RDDs - Creating and Transforming (map, filter, etc.)
  6. Caching - Concepts, Storage Type, Guidelines
  7.  Introduction and Usage
  8. Creating and Using a DataSet
  9. Working with JSON
  10. Using the DataSet DSL
  11. Using SQL with Spark
  12. Data Formats
  13. Optimizations: Catalyst and Tungsten
  14. DataSets vs. DataFrames vs. RDDs

Module 3 - Spark application programming

  1. Understand the purpose and usage of the SparkContext
  2. Initialize Spark with the Python programming language
  3. Describe and run some Spark examples
  4. Pass functions to Spark
  5. Create and run a Spark standalone application
  6. Submit applications to the cluster
  7. Overview, Basic Driver Code, SparkConf
  8. Creating and Using a SparkContext/SparkSession
  9. Building and Running Applications
  10. Application Lifecycle
  11. Cluster Managers
  12. Logging and Debugging

Module 4 - Introduction to Spark libraries

  1. Understand and use the various Spark libraries

Module 5 - Spark configuration, monitoring and tuning

  1. Understand components of the Spark cluster
  2. Configure Spark to modify the Spark properties, environmental variables, or logging properties
  3. Monitor Spark using the web UIs, metrics, and external instrumentation
  4. Understand performance tuning considerations
  5. The Spark UI
  6. Narrow vs. Wide Dependencies
  7. Minimizing Data Processing and Shuffling
  8. Caching - Concepts, Storage Type, Guidelines
  9. Using Caching
  10. Using Broadcast Variables and Accumulators

Module 6 - Spark STREAMING (optional)

  1. Overview and Streaming Basics
  2. Structured Streaming
  3. DStreams (Discretized Steams),
  4. Architecture, Stateless, Stateful, and Windowed Transformations
  5. Spark Streaming API
  6. Programming and Transformations
  • Understand the need for Spark in data processing
  • Understand the Spark architecture and how it distributes computations to cluster nodes
  • Be familiar with basic installation / setup / layout of Spark
  • Use the Spark for interactive and ad-hoc operations
  • Use Dataset/DataFrame/Spark SQL to efficiently process structured data
  • Understand basics of RDDs (Resilient Distributed Datasets), and data partitioning, pipelining, and computations
  • Understand Spark's data caching and its usage
  • Understand performance implications and optimizations when using Spark
  • Be familiar with Spark Graph Processing and SparkML machine learning

Receive the latest version of this course into your inbox


16th Sep 2019 - 2 days £1495

see all dates


Show Discount for this course


  Bring a JBI course to your office
  and train a whole team onsite
  0800 028 6400
or request quote

  You can customise this course to
  suit your exact needs here
  0800 028 6400 or request quote

0800 028 6400

Why JBI ?

►"great technology tips"
► "Access to exclusive content"
► "Short course means less time off"

►"Inspiring trainers"
► "Joined via web"
► "Knowledgable sales staff"

Get exclusive news about upcoming programs, technical insights & special offers