Just Published: 5 Field Notes from Real Production ML/AI Systems

Most ML/AI content falls into two camps:

Trend summaries that don’t help you ship
Tool promotion framed as insight

Today, we’re releasing five long-form Field Notes from production ML/AI systems, built directly from conference talks and post-deployment analysis across the TMLS and MLOps World community.

These are designed to document how real systems behave, not just what looked good on stage.

If you’re an ML engineer, MLOps practitioner, platform owner, or technical lead responsible for ML/AI, and Agentic systems that run beyond the demo, these were written for you.

What these Field Notes are (and what they aren’t)

They are:

Grounded in operator experience, concrete architectures, and real-world tradeoffs
Built from actual conference sessions, not recycled marketing content
Focused on what scaled, what broke, and what didn’t generalize

They aren’t:

Vendor content
Thought leadership pieces
Event promotion in disguise

These are the lessons we want to preserve inside the TMLS community:

What’s in the first 5 issues:

1. RAG isn’t dead. It’s just rarely production-ready.
Retrieval becomes the bottleneck under scale: latency, cost, access control, and evaluation all hit differently when real users show up. This issue breaks down RAG as a system, not just a model, and why classic retrieval baselines still hold up.

2. Agents vs. prompt engineering: when complexity pays off
Based on DeepMind’s decision framework, this one shows how to avoid overbuilding when a prompt will do—and what makes multi-agent loops brittle at production scale.

3. Self-healing pipelines: normalizing drift without chaos
Drift isn’t an edge case. This issue lays out a self-healing architecture: monitor → diagnose → intervene safely → optimize, with guardrails that prevent auto-fixes from becoming incidents.

4. Video translation at scale: why one LLM call becomes seven layers
Vimeo’s production architecture for multilingual translation isn’t just an API call. It’s orchestration, validation, retries, phonetic constraints, and evaluation logic that acts like software—not a prompt.

5. Shipping agentic AI in enterprise: Toyota’s 3–4 month playbook
How an engineering-first team ships GenAI systems with governance and interoperability built in. Less about tooling, more about operating structure as a capability.

Why we’re publishing these

TMLS is here to give production teams a credible place to share what happens after deployment, when systems have to scale, fail, and adapt under real-world constraints.

These Field Notes are part of that mission: capturing the kind of experience that rarely makes it into polished talks, so the next team doesn’t have to start from scratch.

Read the full series on Substack

Start with the one closest to what you’re shipping now: RAG, agents, pipelines, or enterprise deployment. Each week, we’ll post one more and synthesize the most prevalent lessons learned.

Read Full Series

Toronto Machine Learning Summit

June 16th - 19th

Just Published: 5 Field Notes from Real Production ML/AI Systems

What these Field Notes are (and what they aren’t)

What’s in the first 5 issues:

Why we’re publishing these

Read the full series on Substack

TMLS

9th Annual:

TMLS

Stay up to date with all social invites and news for TMLS 2025

Just Published: 5 Field Notes from Real Production ML/AI Systems

What these Field Notes are (and what they aren’t)

What’s in the first 5 issues:

Why we’re publishing these

Read the full series on Substack

TMLS

Who Attends

2023 Event Demographics

2023 Technical Background

2023 Attendees & Thought Leadership