Deepkamal Gill

Senior AI/ML Scientist,

The Vanguard Group

ABOUT THE SPEAKER:

Deepkamal Kaur Gill is a Senior Applied AI Scientist at Vanguard, where she builds production-grade LLM systems for high-stakes financial applications. Her work spans data generation, post-training, and evaluation, with a focus on building reliable, low-latency AI systems under real-world constraints.

Deepkamal holds a Master’s in Computer Science from the University of Toronto and is an active contributor to the AI community through research, mentorship, and initiatives supporting women in technology. At TMLS, she brings a practitioner’s perspective on what it truly takes to scale LLMs in production.

TALK TITLE:

Scaling Production-Grade LLMs: Diagnosing Hidden Bottlenecks in Training and Inference Systems

TRACK:

Technical / Engineering Talks

SUB TOPIC:

Inference Serving & Optimization

ABSTRACT:

While recent advances in LLMs emphasize improved model capabilities, many systems fail to scale in real-world production settings. Beyond a certain point, adding GPUs or data yields diminishing returns: training stops scaling efficiently, hardware remains underutilized, and inference latency is dominated by system constraints rather than compute. These failures are often silent, poorly documented, and difficult to diagnose in distributed environments.

In this talk, we share lessons from building enterprise-scale domain LLM systems, focusing on the system-level bottlenecks that limit scaling in practice. We examine failure modes across distributed training and inference—including communication overhead, pipeline imbalance, numerical instability during training as well as memory-bound decoding, KV cache growth, and throughput–latency tradeoffs at inference—and show how they manifest in production systems.

Rather than introducing new modeling techniques, this session presents a practical, symptom-driven approach to debugging: identifying failure patterns, tracing their root causes, and applying targeted mitigations. The key takeaway is that scaling LLMs is fundamentally a systems problem, and attendees will leave with a concrete framework to diagnose bottlenecks and make better design decisions when moving from prototype to production.

WHAT YOU’LL LEARN:

TBA

Deepkamal Gill

Who Attends

2023 Event Demographics

2023 Technical Background

2023 Attendees & Thought Leadership