This advanced training program connects machine learning engineering with site reliability engineering (SRE) to create reliable, scalable, and production-ready ML systems. The course covers best practices from software engineering and DevOps throughout the ML lifecycle.
Participants will explore key topics such as ML model monitoring, data reliability, model serving strategies, and incident response, aligned with industry standards like MLOps best practices, machine learning system design, and ML deployment strategies.
By the end of this course, participants will be able to:
This program combines instructor-led sessions, peer discussions, case studies, and simulation labs. Participants will work in small groups to design machine learning system architectures, analyse model failures, and establish Service Level Objectives (SLOs) and Service Level Indicators (SLIs).
Basic understanding of ML concepts, familiarity with DevOps or software engineering practices, and some experience with cloud platforms or ML frameworks (e.g., TensorFlow, PyTorch) are recommended.
Each day's session is generally structured to last around 4-5 hours, with breaks and interactive activities included. The total course duration spans five days, approximately 20-25 hours of instruction.
Monitoring ML models goes beyond basic metrics like uptime and latency. It involves tracking model accuracy, feature drift, data skew, and SLO violations. Reliable Machine Learning emphasises the need for specialised observability strategies that address ML-specific failure modes.
Unlike typical MLOps training, this course emphasises operational excellence. It combines reliable machine learning principles with software engineering practices and real-world case studies of ML failures, model drift, and incident recovery.
Incorporating Site Reliability Engineering (SRE) concepts like Service Level Objectives (SLOs) and observability, participants learn to effectively build, deploy, and manage machine learning models in complex environments. The course also addresses ethical considerations, feature store design, and continuous deployment, making it a modern choice for professionals seeking scalable and high-performing machine learning systems.
credits: 5 credit per day
Course Mode: full-time
Provider: Agile Leaders Training Center
| Image | Location | Dates | Duration | Mode | Price | Actions |
|---|---|---|---|---|---|---|
|
|
Dubai |
Week 23, 2026 01 - 05 Jun 2026 |
5 Days | Onsite | €6,500 | |
|
|
Dubai |
Week 26, 2026 22 - 26 Jun 2026 |
5 Days | Onsite | €4,500 | |
|
|
Istanbul |
Week 29, 2026 13 - 17 Jul 2026 |
5 Days | Onsite | €4,500 | |
|
|
Dubai |
Week 34, 2026 17 - 21 Aug 2026 |
5 Days | Onsite | €4,500 | |
|
|
Istanbul |
Week 35, 2026 24 - 28 Aug 2026 |
5 Days | Onsite | €4,500 | |
|
|
Zoom |
Week 41, 2026 05 - 09 Oct 2026 |
5 Days | Online | €3,000 | |
|
|
Dubai |
Week 02, 2027 12 - 16 Jan 2027 |
5 Days | Onsite | €6,500 |