"Our tailored course provided a well rounded introduction and also covered some intermediate level topics that we needed to know. Clive gave us some best practice ideas and tips to take away. Fast paced but the instructor never lost any of the delegates"
Brian Leek, Data Analyst, May 2022
Why AI testing is different:
non-determinism, semantic correctness, and exactly where assert statements break down
Defining your rubric:
a workshop to build shared scoring criteria for accuracy, tone, helpfulness, and safety
Test dataset construction:
sourcing real prompts from production logs, writing expected outputs, and covering the important edge cases
LLM-as-judge setup:
configuring a second model to score outputs automatically and understanding when to trust that score
Structured output validation:
JSON schema checks, field-level assertions, and format regression tests that run in milliseconds
Hallucination detection lab:
reference matching techniques, factual grounding checks, and a confidence scoring approach you can implement today
Regression suite build:
running your full evaluation set on every prompt change and alerting when scores drop below threshold
Metrics dashboard:
tracking pass rates, average quality scores, and trends over time in a lightweight tool
CI/CD integration: adding evaluation runs to your existing pipeline and configuring deploy blocks on quality failures
Stakeholder reporting: translating evaluation scores into plain-language quality summaries that non-technical audiences can act on
|
Developers and Engineers |
"Our tailored course provided a well rounded introduction and also covered some intermediate level topics that we needed to know. Clive gave us some best practice ideas and tips to take away. Fast paced but the instructor never lost any of the delegates"
Brian Leek, Data Analyst, May 2022
Sign up for the JBI Training newsletter to receive technology tips directly from our instructors - Analytics, AI, ML, DevOps, Web, Backend and Security.
This practical course teaches how to design and implement robust testing and evaluation systems for AI applications in production environments.
Participants will explore why AI testing differs from traditional software testing due to non-determinism and semantic variability in outputs.
The course covers how to define evaluation rubrics that measure accuracy, tone, safety, and usefulness in a consistent and repeatable way.
Learners will build structured test datasets using real production data and edge cases to ensure meaningful evaluation coverage.
Hands-on labs include automated validation, hallucination detection, regression testing, and LLM-as-a-judge evaluation techniques.
The course also integrates AI testing into CI/CD pipelines with dashboards and reporting to monitor quality over time.
By the end of the course, participants will be able to build scalable evaluation frameworks that ensure AI systems remain reliable and production-ready.
CONTACT
+44 (0)20 8446 7555
Copyright © 2025 JBI Training. All Rights Reserved.
JB International Training Ltd - Company Registration Number: 08458005
Registered Address: Wohl Enterprise Hub, 2B Redbourne Avenue, London, N3 2BS
Modern Slavery Statement & Corporate Policies | Terms & Conditions | Contact Us
POPULAR
AI training courses CoPilot training course
Threat modelling training course Python for data analysts training course
Power BI training course Machine Learning training course
Spring Boot Microservices training course Terraform training course