ABOUT THE SPEAKER:
David Rosenberg leads the Machine Learning Strategy team in the Office of the CTO at Bloomberg. He was a co-author of the BloombergGPT research paper, which explored what it would take to build a large language model tailored to the financial domain. He was previously an adjunct associate professor at NYU’s Center for Data Science, where he twice received the “Professor of the Year” award. Before joining Bloomberg, David served as Chief Scientist at Sense Networks, a location data analytics and mobile advertising company. He holds a Ph.D. in statistics from UC Berkeley, an S.M. in applied mathematics from Harvard University, and a B.S. in mathematics from Yale University.
TALK TITLE:
TRACK:
SUB TOPIC:
ABSTRACT:
Reinforcement Learning for Large Language Models: A Modern View is a 3-hour tutorial for the Toronto Machine Learning Summit. It provides a motivation-first, mathematically rigorous introduction to reinforcement-learning-style post-training for large language models, aimed at machine learning researchers and advanced students who want a principled view of the methods behind modern LLM alignment and adaptation.
The tutorial starts with a brief overview of the contemporary LLM post-training pipeline and then develops the policy-gradient foundations needed to understand these methods from first principles. Instead of treating the field as a sequence of named algorithms, it organizes the material around the major design dimensions that distinguish practical approaches: how the training signal is obtained, how variance is reduced, how policy drift is controlled, how KL regularization is imposed and estimated, and how credit is assigned within a completion. Throughout, the tutorial emphasizes the tradeoffs encoded by these choices and distinguishes clearly between mathematically established results, theory-motivated arguments, practical heuristics, and empirical findings.
This perspective is then used to place methods such as REINFORCE, PPO-style RLHF, DPO, RLOO, and GRPO in a common mathematical framework, and to connect them to published descriptions of post-training in recent frontier models. Participants will leave with a unified understanding of the foundations of LLM post-training, the main ideas behind current methods, and the research questions that remain open.
WHAT YOU’LL LEARN:
Business Leaders: C-Level Executives, Project Managers, and Product Owners will get to explore best practices, methodologies, principles, and practices for achieving ROI.
Engineers, Researchers, Data Practitioners: Will get a better understanding of the challenges, solutions, and ideas being offered via breakouts & workshops on Natural Language Processing, Neural Nets, Reinforcement Learning, Generative Adversarial Networks (GANs), Evolution Strategies, AutoML, and more.
Job Seekers: Will have the opportunity to network virtually and meet over 30+ Top Al Companies.
Ignite what is an Ignite Talk?
Ignite is an innovative and fast-paced style used to deliver a concise presentation.
During an Ignite Talk, presenters discuss their research using 20 image-centric slides which automatically advance every 15 seconds.
The result is a fun and engaging five-minute presentation.
You can see all our speakers and full agenda here