David Rosenberg

Head of Machine Learning Strategy, CTO Office,

Bloomberg

ABOUT THE SPEAKER:

David Rosenberg leads the Machine Learning Strategy team in the Office of the CTO at Bloomberg. He was a co-author of the BloombergGPT research paper, which explored what it would take to build a large language model tailored to the financial domain. He was previously an adjunct associate professor at NYU’s Center for Data Science, where he twice received the “Professor of the Year” award. Before joining Bloomberg, David served as Chief Scientist at Sense Networks, a location data analytics and mobile advertising company. He holds a Ph.D. in statistics from UC Berkeley, an S.M. in applied mathematics from Harvard University, and a B.S. in mathematics from Yale University.

TALK TITLE:

Reinforcement Learning for Large Language Models: A Modern View

TRACK:

Fundamental Research (No Direct Business ROI)

SUB TOPIC:

Reinforcement Learning & Control

ABSTRACT:

Reinforcement Learning for Large Language Models: A Modern View is a 3-hour tutorial for the Toronto Machine Learning Summit. It provides a motivation-first, mathematically rigorous introduction to reinforcement-learning-style post-training for large language models, aimed at machine learning researchers and advanced students who want a principled view of the methods behind modern LLM alignment and adaptation.

The tutorial starts with a brief overview of the contemporary LLM post-training pipeline and then develops the policy-gradient foundations needed to understand these methods from first principles. Instead of treating the field as a sequence of named algorithms, it organizes the material around the major design dimensions that distinguish practical approaches: how the training signal is obtained, how variance is reduced, how policy drift is controlled, how KL regularization is imposed and estimated, and how credit is assigned within a completion. Throughout, the tutorial emphasizes the tradeoffs encoded by these choices and distinguishes clearly between mathematically established results, theory-motivated arguments, practical heuristics, and empirical findings.

This perspective is then used to place methods such as REINFORCE, PPO-style RLHF, DPO, RLOO, and GRPO in a common mathematical framework, and to connect them to published descriptions of post-training in recent frontier models. Participants will leave with a unified understanding of the foundations of LLM post-training, the main ideas behind current methods, and the research questions that remain open.

WHAT YOU’LL LEARN:

TBA

David Rosenberg

Who Attends

2023 Event Demographics

2023 Technical Background

2023 Attendees & Thought Leadership