LIVE Instructor-Led Courses
Dismiss

Apache Spark Development training course

A lightning-fast unified analytics engine for big data and machine learning

JBI training course London UK

"Good introduction to Apache Spark. The trainer was great at talking us through the information, specifically optimisation methods. He spoke slowly and concisely which really got his points across. He effectively tailored the course to our specifications which we also appreciated."

RL, Financial Crime Technologist, Apache Spark, April 2021

Public Courses

22/04/24 - 2 days
£1495 +VAT
03/06/24 - 2 days
£1495 +VAT
15/07/24 - 2 days
£1495 +VAT

Customised Courses

* Train a team
* Tailor content
* Flex dates
From £1200 / day
EDF logo Capita logo Sky logo NHS logo RBS logo BBC logo CISCO logo
JBI training course London UK

  • Understand the need for Spark in data processing
  • Understand the Spark architecture and how it distributes computations to cluster nodes
  • Become familiar with basic installation/setup/layout of Spark
  • Use Spark for interactive and ad-hoc operations
  • Use DataSet/DataFrame/Spark SQL to efficiently process structured data
  • Understand the basics of RDDs (Resilient Distributed Datasets), data partitioning, pipelining and computations
  • Understand performance implications and optimisations when using Spark
  • Understand Spark's data caching and usage
  • Become familiar with Spark Graph Processing and SparkML machine learning

Module 1 - Introduction to Spark - Getting started

  1. What is Spark and what is its purpose?
  2.  Overview, Motivations, Spark Systems
  3. Spark Ecosystem
  4. Spark vs. Hadoop
  5. Typical Spark Deployment and Usage Environments
  6. Components of the Spark unified stack
  7. Resilient Distributed Dataset (RDD)
  8. Downloading and installing Spark standalone
  9. Python overview
  10. Launching and using the Python shell 

Module 2 - Resilient Distributed Dataset and DataFrames

  1. Understand how to create parallelized collections and external datasets
  2. Work with Resilient Distributed Dataset (RDD) operations
  3. Utilize shared variables and key-value pairs
  4. RDD Concepts, Partitions, Lifecycle, Lazy Evaluation
  5. Working with RDDs - Creating and Transforming (map, filter, etc.)
  6. Caching - Concepts, Storage Type, Guidelines
  7.  Introduction and Usage
  8. Creating and Using a DataSet
  9. Working with JSON
  10. Using the DataSet DSL
  11. Using SQL with Spark
  12. Data Formats
  13. Optimizations: Catalyst and Tungsten
  14. DataSets vs. DataFrames vs. RDDs

Module 3 - Spark application programming

  1. Understand the purpose and usage of the SparkContext
  2. Initialize Spark with the Python programming language
  3. Describe and run some Spark examples
  4. Pass functions to Spark
  5. Create and run a Spark standalone application
  6. Submit applications to the cluster
  7. Overview, Basic Driver Code, SparkConf
  8. Creating and Using a SparkContext/SparkSession
  9. Building and Running Applications
  10. Application Lifecycle
  11. Cluster Managers
  12. Logging and Debugging

Module 4 - Introduction to Spark libraries

  1. Understand and use the various Spark libraries

Module 5 - Spark configuration, monitoring and tuning

  1. Understand components of the Spark cluster
  2. Configure Spark to modify the Spark properties, environmental variables, or logging properties
  3. Monitor Spark using the web UIs, metrics, and external instrumentation
  4. Understand performance tuning considerations
  5. The Spark UI
  6. Narrow vs. Wide Dependencies
  7. Minimizing Data Processing and Shuffling
  8. Caching - Concepts, Storage Type, Guidelines
  9. Using Caching
  10. Using Broadcast Variables and Accumulators

Module 6 - Spark STREAMING (optional)

  1. Overview and Streaming Basics
  2. Structured Streaming
  3. DStreams (Discretized Steams),
  4. Architecture, Stateless, Stateful, and Windowed Transformations
  5. Spark Streaming API
  6. Programming and Transformations
JBI training course London UK

Python or Java/Scala developers who need to learn about how to develop Big Data and ML solutions with Apache Spark


5 star

4.8 out of 5 average

"Good introduction to Apache Spark. The trainer was great at talking us through the information, specifically optimisation methods. He spoke slowly and concisely which really got his points across. He effectively tailored the course to our specifications which we also appreciated."

RL, Financial Crime Technologist, Apache Spark, April 2021



“JBI  did a great job of customizing their syllabus to suit our business  needs and also bringing our team up to speed on the current best practices. Our teams varied widely in terms of experience and  the Instructor handled this particularly well - very impressive”

Brian F, Team Lead, RBS, Data Analysis Course, 20 April 2022

 

 

JBI training course London UK

Newsletter

 

Sign up for the JBI Training newsletter to stay updated with world-class technology training opportunities, including Analytics, AI, ML, DevOps, Web, Backend and Security. Our Power BI Training Course is especially popular.  Gain new skills, useful tips, and validate your expertise with an industry-leading organisation, all tailored to your schedule and learning preferences.



Our Apache Spark training course provides you with a solid technical introduction to the Spark architecture and how Spark works.

You will learn the basic building blocks of Spark, including RDDs and the distributed compute engine, as well as higher-level constructs that provide a simpler and more capable interface, including Spark SQL and DataFrames.

CONTACT
+44 (0)20 8446 7555

[email protected]

SHARE

 

Copyright © 2023 JBI Training. All Rights Reserved.
JB International Training Ltd  -  Company Registration Number: 08458005
Registered Address: Wohl Enterprise Hub, 2B Redbourne Avenue, London, N3 2BS

Modern Slavery Statement & Corporate Policies | Terms & Conditions | Contact Us

POPULAR

Rust training course                                                                          React training course

Threat modelling training course   Python for data analysts training course

Power BI training course                                   Machine Learning training course

Spring Boot Microservices training course              Terraform training course

Kubernetes training course                                                            C++ training course

Power Automate training course                               Clean Code training course