
Most ML/AI content falls into two camps:
- Trend summaries that don’t help you ship
- Tool promotion framed as insight
Today, we’re releasing five long-form Field Notes from production ML/AI systems, built directly from conference talks and post-deployment analysis across the TMLS and MLOps World community.
These are designed to document how real systems behave, not just what looked good on stage.
If you’re an ML engineer, MLOps practitioner, platform owner, or technical lead responsible for ML/AI, and Agentic systems that run beyond the demo, these were written for you.
What these Field Notes are (and what they aren’t)
They are:
- Grounded in operator experience, concrete architectures, and real-world tradeoffs
- Built from actual conference sessions, not recycled marketing content
- Focused on what scaled, what broke, and what didn’t generalize
They aren’t:
- Vendor content
- Thought leadership pieces
- Event promotion in disguise
These are the lessons we want to preserve inside the TMLS community:
What’s in the first 5 issues:
1. RAG isn’t dead. It’s just rarely production-ready.
Retrieval becomes the bottleneck under scale: latency, cost, access control, and evaluation all hit differently when real users show up. This issue breaks down RAG as a system, not just a model, and why classic retrieval baselines still hold up.
2. Agents vs. prompt engineering: when complexity pays off
Based on DeepMind’s decision framework, this one shows how to avoid overbuilding when a prompt will do—and what makes multi-agent loops brittle at production scale.
3. Self-healing pipelines: normalizing drift without chaos
Drift isn’t an edge case. This issue lays out a self-healing architecture: monitor → diagnose → intervene safely → optimize, with guardrails that prevent auto-fixes from becoming incidents.
4. Video translation at scale: why one LLM call becomes seven layers
Vimeo’s production architecture for multilingual translation isn’t just an API call. It’s orchestration, validation, retries, phonetic constraints, and evaluation logic that acts like software—not a prompt.
5. Shipping agentic AI in enterprise: Toyota’s 3–4 month playbook
How an engineering-first team ships GenAI systems with governance and interoperability built in. Less about tooling, more about operating structure as a capability.
Why we’re publishing these
TMLS is here to give production teams a credible place to share what happens after deployment, when systems have to scale, fail, and adapt under real-world constraints.
These Field Notes are part of that mission: capturing the kind of experience that rarely makes it into polished talks, so the next team doesn’t have to start from scratch.
Read the full series on Substack
Start with the one closest to what you’re shipping now: RAG, agents, pipelines, or enterprise deployment. Each week, we’ll post one more and synthesize the most prevalent lessons learned.