Highlights
- Define what AI production failure looks like
- Instrument an AI app for monitoring
- Detect output quality degradation
- Identify incoming data drift
- Build a real-time AI health dashboard
- Set meaningful alerting thresholds
- Root-cause an AI regression
- Implement a user feedback loop
- Schedule automated evaluation runs
- Create an on-call incident runbook
Course Details
AI failure taxonomy:
silent degradation, sudden regression, and data drift explained through case studies from real production systems
Instrumentation lab:
adding structured logging, metric emission, and trace IDs to an existing AI application with minimal code changes
Baseline setting:
calculating normal output quality distributions so degradation triggers meaningful alerts rather than noise
Data drift detection:
applying statistical tests to identify when incoming features or text inputs have shifted from the training distribution
Dashboard build lab:
constructing a real-time AI health view in Grafana or Power BI from scratch using your instrumented data
Alerting configuration:
setting sensible thresholds, avoiding alert fatigue through grouping, and routing alerts to the right person
Incident investigation lab:
given a set of production logs from a degraded system, participants find and explain the root cause
Feedback loop design:
capturing user corrections, escalations, and negative signals and routing them back into your evaluation pipeline
Scheduled evaluation:
running your held-out test set on a nightly schedule and reporting quality score trends in a digest email
Runbook writing workshop:
producing a clear on-call guide that a non-ML engineer can follow when facing an AI production incident
Who should attend
Data and MLOps
Feedback
4.8 out of 5 average
"Our tailored course provided a well rounded introduction and also covered some intermediate level topics that we needed to know. Clive gave us some best practice ideas and tips to take away. Fast paced but the instructor never lost any of the delegates"
Brian Leek, Data Analyst, May 2022