Mehul Soni

Senior AI Research Engineer, Enterprise AI Research,

The Vanguard Group

ABOUT THE SPEAKER:

Mehul is a Senior AI Engineer at Vanguard, specializing in building enterprise-scale LLM and agentic AI systems that bridge applied research and production impact. She brings over five years of industry experience applying AI/ML techniques to solve complex business problems. Her work spans LLM post-training, multi-agent systems, evaluation frameworks, and AI systems engineering, with a strong emphasis on translating cutting-edge research into scalable, production-ready solutions. Mehul is actively engaged in AI and professional communities, contributing to initiatives that promote mentorship and inclusive growth, such as Women in Data Science.

TALK TITLE:

Scaling Production-Grade LLMs: Diagnosing Hidden Bottlenecks in Training and Inference Systems

TRACK:

Technical / Engineering Talks

SUB TOPIC:

Inference Serving & Optimization

ABSTRACT:

While recent advances in LLMs emphasize improved model capabilities, many systems fail to scale in real-world production settings. Beyond a certain point, adding GPUs or data yields diminishing returns: training stops scaling efficiently, hardware remains underutilized, and inference latency is dominated by system constraints rather than compute. These failures are often silent, poorly documented, and difficult to diagnose in distributed environments.

In this talk, we share lessons from building enterprise-scale domain LLM systems, focusing on the system-level bottlenecks that limit scaling in practice. We examine failure modes across distributed training and inference—including communication overhead, pipeline imbalance, numerical instability during training as well as memory-bound decoding, KV cache growth, and throughput–latency tradeoffs at inference—and show how they manifest in production systems.

Rather than introducing new modeling techniques, this session presents a practical, symptom-driven approach to debugging: identifying failure patterns, tracing their root causes, and applying targeted mitigations. The key takeaway is that scaling LLMs is fundamentally a systems problem, and attendees will leave with a concrete framework to diagnose bottlenecks and make better design decisions when moving from prototype to production.

WHAT YOU’LL LEARN:

TBA

Mehul Soni

Who Attends

2023 Event Demographics

2023 Technical Background

2023 Attendees & Thought Leadership