Question 1

What Apache training courses does JBI Training offer?

Accepted Answer

JBI Training offers four Apache courses covering the most widely used Apache data processing and streaming technologies. Available courses are Apache Spark Development (two days), Apache Spark 3 — Databricks Certified Associate Developer (five days), Apache Kafka Essentials (two days), and Apache Storm (two days). All courses are available as scheduled classroom sessions in London, as live online instructor-led training, or as customised onsite programmes for data engineering and platform teams.

Question 2

What is Apache Spark and what is it used for?

Accepted Answer

Apache Spark is an open-source, distributed data processing engine designed for large-scale data analytics and transformation workloads. It processes data in memory across a cluster of machines, making it significantly faster than older batch processing frameworks such as Hadoop MapReduce for most workloads. Spark is used for large-scale ETL and data transformation pipelines, machine learning at scale using the MLlib library, graph processing, stream processing using Spark Structured Streaming, and interactive data analysis. It is the most widely adopted distributed data processing framework in the industry and is available on all major cloud platforms including Azure Databricks, AWS EMR, and Google Dataproc.

Question 3

What is the difference between the Apache Spark Development course and the Apache Spark 3 Databricks Certified Associate Developer course?

Accepted Answer

The Apache Spark Development course is a two-day practical introduction to Spark for data engineers and developers who need to build and run Spark workloads. It covers the Spark architecture, the DataFrame API, Spark SQL, data transformation and aggregation, reading and writing data in various formats, and an introduction to Structured Streaming. The Apache Spark 3 — Databricks Certified Associate Developer course is a comprehensive five-day programme that covers Spark 3 in full depth and prepares delegates for the Databricks Certified Associate Developer for Apache Spark certification examination. It includes advanced Spark topics, Databricks-specific features, performance tuning, and certification-focused preparation. The five-day course is suited to data engineers who want a thorough grounding in Spark 3 and a recognised professional credential.

Question 4

What is Apache Kafka and what does the Kafka Essentials course cover?

Accepted Answer

Apache Kafka is an open-source distributed event streaming platform designed to handle high-throughput, fault-tolerant, real-time data streams. It acts as a highly scalable message broker that allows applications to publish, subscribe to, store, and process streams of events in real time. Kafka is widely used for building real-time data pipelines, event-driven microservices architectures, activity tracking, operational monitoring, and stream processing applications. JBI's two-day Apache Kafka Essentials course covers the Kafka architecture and core concepts — including topics, partitions, producers, consumers, and consumer groups — setting up and configuring Kafka, producing and consuming messages, Kafka Connect for integrating with external systems, Kafka Streams for stream processing, and operational and monitoring considerations for running Kafka in production.

Question 5

What is Apache Storm and how does it differ from Spark Streaming?

Accepted Answer

Apache Storm is an open-source distributed real-time computation system designed for processing unbounded streams of data with very low latency. It processes individual events as they arrive, making it well-suited to use cases that require immediate, sub-second processing of each event — such as fraud detection, real-time alerting, and financial transaction processing. Spark Structured Streaming processes data in micro-batches, introducing a small amount of latency in exchange for higher throughput and easier integration with the rest of the Spark ecosystem. The choice between Storm and Spark Streaming depends on latency requirements, existing tooling, and the nature of the streaming workload. JBI's two-day Apache Storm course covers Storm's topology model, spouts and bolts, fault tolerance, state management, and practical stream processing use cases.

Question 6

What is the Databricks Certified Associate Developer certification and does the course prepare for it?

Accepted Answer

The Databricks Certified Associate Developer for Apache Spark is a professional certification that validates a developer's ability to use the Spark DataFrame API, Spark SQL, and Spark's core processing capabilities at an associate level. It is widely recognised in the data engineering community and is particularly relevant for professionals working in Azure Databricks, AWS, or Google Cloud environments. JBI's five-day Apache Spark 3 — Databricks Certified Associate Developer course is specifically designed to prepare delegates for this examination, covering the full scope of the certification syllabus with hands-on exercises, practice questions, and exam technique guidance alongside comprehensive technical content.

Question 7

Can Apache training courses be customised for a corporate data engineering team?

Accepted Answer

Yes. All Apache courses at JBI can be delivered as customised onsite or online programmes for corporate data engineering and platform teams. Content and exercises can be tailored to the team's existing data stack, cloud environment, and specific use cases — for example, a team using Azure Databricks can receive Spark training focused on the Databricks environment, or a team building an event-driven microservices architecture can receive Kafka training focused on their specific integration patterns. JBI has delivered data engineering and Apache ecosystem training for teams at organisations including the BBC, NHS, RBS, Sky, EDF, and Cisco.

Question 8

Are Apache Spark and Kafka courses kept up to date with the latest releases?

Accepted Answer

Yes. The Apache ecosystem evolves continuously — with regular Spark releases introducing new features to the DataFrame API, Structured Streaming, and MLlib, and ongoing Kafka developments including updates to Kafka Streams, the KRaft consensus protocol replacing ZooKeeper, and new connector capabilities. JBI's Apache training content is continuously reviewed and updated to reflect the latest stable versions of Spark and Kafka, current Databricks platform features, and evolving best practices in data engineering and real-time streaming. Delegates learn skills that are current and directly applicable to the versions and tools used in professional data engineering environments today.

Apache Spark Development training course

A lightning-fast unified analytics engine for big data and machine learning

Public Courses

Customised Courses

Highlights

Course Details

Module 1 - Introduction to Spark - Getting started

Module 2 - Resilient Distributed Dataset and DataFrames

Module 3 - Spark application programming

Module 4 - Introduction to Spark libraries

Module 5 - Spark configuration, monitoring and tuning

Module 6 - Spark STREAMING (optional)

Who should attend

Feedback

4.8 out of 5 average

Certification

More about this course

FAQs