Pentaho Data Integration training course

A 3 day introduction to data transformation using Pentaho Data Integration (PDI). From the very beginning all the way to developing an ETL framework to ingest files of varying structure.

"The content made the course enjoyable and I liked that the course structure allowed me to spend a lot of time "hands-on" with the programming interface. Having only used Pentaho DI briefly up to this point, the hands-on nature of this course helped improved my confidence in using the toolset.
The trainer was very engaging and his knowledge of the subject matter was very impressive. He spent time answering questions throughout the course - which was much appreciated.
It would have been impossible to cover everything in a 3 day course, but I think we covered most topics, but more impressive was that the course material allowed me to implement some quite advanced solutions in a relatively small amount of time!
All in all, I think this a good, well structured course.."

DF, Software Engineer Lead, Nov 2021

Public Courses

21/04/25 - 3 days
£2000 +VAT
02/06/25 - 3 days
£2000 +VAT
14/07/25 - 3 days
£2000 +VAT

Customised Courses

* Train a team
* Tailor content
* Flex dates
From £1200 / day
  • Transformations
    • Input and output steps
    • Field transformations
    • Joins and lookups
    • Set transformations
    • JSON and XML inputs
    • Variables and portability
    • Logging and performance
    • Metadata injection
  • Jobs
    • Basic orchestration
    • File and database management
    • Iteration and looping in jobs


  • Installing and starting PDI. The user interface

Part I – Transformations

Input and output steps; 

  • Exploration of the various ways to read data into, and write data out of, PDI: CSV files, Excel files SQL queries, etc. Installing JDBC drivers
  • Lab 1: CSV Input, MySQL output

Field transformations

  • Overview of various transformation steps: Calculator, string manipulation, adding counters, value mapping, handling nulls, javascript and regular expressions.

Joins and lookups

  • Merging two or more data streams and combining the data: managing slowly changing dimensions dimensions, in-memory and database lookups, querying HTTP services/apis, merge joins, row diff, etc.
  • Lab 2: Joins and lookups (enriching data stream)

Set transformations

  • Operations on groups of rows: sorting, grouping, splitting fields into rows, normalising/denormalising data, cloning, appending.
  • Lab 3: Grouping data

JSON and XML inputs

  • Reading XML data via Xpath and using the very fast performing StaX parser. JSON parsing via JSONpath
  • Lab 4: JSON and XML inputs (Xpath, Stax parser, Jsonpath)

Variables and portability

  • Setting and getting variables; global variables, runtime variables, parameters; portable connections, file paths, and other best practices
  • Lab 5: Portable transformations

Logging and Performance

  • Reading PDI logs; analysing performance and runtime metrics; examples of fast and slow streps, identifying bottlenecks; step copies in parallel

Metadata injection

  • Use cases for metadata injection. Modifying metadata in runtime. Advanced metadata injection options.
  • Lab 6: Flexible CSV loading

Part II – Jobs

Basic orchestration

  • Usage of PDI jobs to orchestrate tasks; overview of job entries :sub-jobs, sub-transformations, SQL, shell scripts, conditions, error handling, getting/putting files, etc. Wrapper jobs.

File and DB management

  • Using lock files; downloading and archiving files; checking database connections; conditionally create/drop/modify database structure; error handling; recording execution results
  • Lab 7: Building a simple job

Iteration and looping in jobs

  • Run job/transformation for each file in folder; handling different file types in one go; iterating over API results; loop until condition met; running .sh or .bat scripts depending on OS
  • Lab 8: Developing a powerful ETL framework
New users charged with using Pentaho & or existing users looking to formalise their knowledge - Business Analysts, Data Analyst & ETL developers.

5 star

4.8 out of 5 average

“The Trainer was very knowledgeable and was able to help with every issue throughout the entire course.  ”

JS - Software engineer - Nov 2022



