"Our tailored course provided a well rounded introduction and also covered some intermediate level topics that we needed to know. Clive gave us some best practice ideas and tips to take away. Fast paced but the instructor never lost any of the delegates"
Brian Leek, Data Analyst, May 2022
AI failure taxonomy:
silent degradation, sudden regression, and data drift explained through case studies from real production systems
Instrumentation lab:
adding structured logging, metric emission, and trace IDs to an existing AI application with minimal code changes
Baseline setting:
calculating normal output quality distributions so degradation triggers meaningful alerts rather than noise
Data drift detection:
applying statistical tests to identify when incoming features or text inputs have shifted from the training distribution
Dashboard build lab:
constructing a real-time AI health view in Grafana or Power BI from scratch using your instrumented data
Alerting configuration:
setting sensible thresholds, avoiding alert fatigue through grouping, and routing alerts to the right person
Incident investigation lab:
given a set of production logs from a degraded system, participants find and explain the root cause
Feedback loop design:
capturing user corrections, escalations, and negative signals and routing them back into your evaluation pipeline
Scheduled evaluation:
running your held-out test set on a nightly schedule and reporting quality score trends in a digest email
Runbook writing workshop:
producing a clear on-call guide that a non-ML engineer can follow when facing an AI production incident
Data and MLOps
"Our tailored course provided a well rounded introduction and also covered some intermediate level topics that we needed to know. Clive gave us some best practice ideas and tips to take away. Fast paced but the instructor never lost any of the delegates"
Brian Leek, Data Analyst, May 2022
Sign up for the JBI Training newsletter to receive technology tips directly from our instructors - Analytics, AI, ML, DevOps, Web, Backend and Security.
This practical course teaches how to monitor, evaluate, and maintain AI systems in production to ensure long-term reliability and performance.
Participants will explore common AI failure modes such as drift, regression, and silent degradation using real-world case studies.
The course covers instrumentation techniques for adding logging, metrics, and tracing into AI applications with minimal code changes.
Learners will implement baseline performance measurement and statistical drift detection to identify early signs of model degradation.
Hands-on labs include building monitoring dashboards, configuring alerts, and investigating production incidents using real system data.
The course also focuses on feedback loops, scheduled evaluations, and structured evaluation pipelines to continuously assess AI quality.
By the end of the course, participants will be able to operate, troubleshoot, and maintain production AI systems with confidence and clarity.
CONTACT
+44 (0)20 8446 7555
Copyright © 2025 JBI Training. All Rights Reserved.
JB International Training Ltd - Company Registration Number: 08458005
Registered Address: Wohl Enterprise Hub, 2B Redbourne Avenue, London, N3 2BS
Modern Slavery Statement & Corporate Policies | Terms & Conditions | Contact Us
POPULAR
AI training courses CoPilot training course
Threat modelling training course Python for data analysts training course
Power BI training course Machine Learning training course
Spring Boot Microservices training course Terraform training course