ABOUT THE SPEAKER:
Deepkamal Kaur Gill is a Senior Applied AI Scientist at Vanguard, where she builds production-grade LLM systems for high-stakes financial applications. Her work spans data generation, post-training, and evaluation, with a focus on building reliable, low-latency AI systems under real-world constraints.
Deepkamal holds a Master’s in Computer Science from the University of Toronto and is an active contributor to the AI community through research, mentorship, and initiatives supporting women in technology. At TMLS, she brings a practitioner’s perspective on what it truly takes to scale LLMs in production.
TALK TITLE:
TRACK:
SUB TOPIC:
ABSTRACT:
While recent advances in LLMs emphasize improved model capabilities, many systems fail to scale in real-world production settings. Beyond a certain point, adding GPUs or data yields diminishing returns: training stops scaling efficiently, hardware remains underutilized, and inference latency is dominated by system constraints rather than compute. These failures are often silent, poorly documented, and difficult to diagnose in distributed environments.
In this talk, we share lessons from building enterprise-scale domain LLM systems, focusing on the system-level bottlenecks that limit scaling in practice. We examine failure modes across distributed training and inference—including communication overhead, pipeline imbalance, numerical instability during training as well as memory-bound decoding, KV cache growth, and throughput–latency tradeoffs at inference—and show how they manifest in production systems.
Rather than introducing new modeling techniques, this session presents a practical, symptom-driven approach to debugging: identifying failure patterns, tracing their root causes, and applying targeted mitigations. The key takeaway is that scaling LLMs is fundamentally a systems problem, and attendees will leave with a concrete framework to diagnose bottlenecks and make better design decisions when moving from prototype to production.
WHAT YOU’LL LEARN:
Business Leaders: C-Level Executives, Project Managers, and Product Owners will get to explore best practices, methodologies, principles, and practices for achieving ROI.
Engineers, Researchers, Data Practitioners: Will get a better understanding of the challenges, solutions, and ideas being offered via breakouts & workshops on Natural Language Processing, Neural Nets, Reinforcement Learning, Generative Adversarial Networks (GANs), Evolution Strategies, AutoML, and more.
Job Seekers: Will have the opportunity to network virtually and meet over 30+ Top Al Companies.
Ignite what is an Ignite Talk?
Ignite is an innovative and fast-paced style used to deliver a concise presentation.
During an Ignite Talk, presenters discuss their research using 20 image-centric slides which automatically advance every 15 seconds.
The result is a fun and engaging five-minute presentation.
You can see all our speakers and full agenda here