ABOUT THE SPEAKER:
Anthony is a Senior Research Machine Learning Scientist, leading a team responsible for delivering predictive models in the finance & trading space. Prior to joining Layer 6, Anthony completed a PhD in the Department of Statistics at the University of Oxford, with a focus on statistical machine learning and generative modelling. He also completed a BMath and MMath at the University of Waterloo, with several internships focused on finance and research. Besides the applied side, Anthony has also helped deliver over fifteen research papers to top conferences and journals whilst at Layer 6, focusing on the areas of generative modelling, tabular data analysis, and anomaly detection.
TALK TITLE:
TRACK:
SUB TOPIC:
ABSTRACT:
Tabular data is ubiquitous worldwide, driving solutions for generic business problems, applied time series forecasting, and beyond. This inherent heterogeneity had hindered Tabular Foundation Models (TFMs) from rapidly generalizing to unseen datasets. In-Context Learning (ICL) offers a promising path for TFMs, enabling dynamic task adaptation without fine-tuning. Moving beyond re-purposed language models, we propose combining ICL-based retrieval with self-supervised learning to train dedicated TFMs. We evaluate real versus synthetic pre-training data, demonstrating that real data captures complex signals critical for improving downstream generalization. Incorporating this real data yields significantly faster training and superior adaptability across diverse contexts. Our resulting model, TabDPT, achieves strong performance across varied classification and regression benchmarks. Importantly, our pre-training procedure demonstrates that scaling model and data size drives consistent, power-law performance improvements. This echoes foundational scaling laws, confirming that robust, large-scale, and equitable TFMs are highly achievable. We have open-sourced our complete training and inference pipeline.
WHAT YOU’LL LEARN:
Tabular foundation models are continuing to vastly improve. Real data has been shown to be a legitimate option for pre-training despite previously being underutilized in favour of synthetic pre-training data. We see as well that tabular foundation models are starting to demonstrate scaling laws much like LLMs.
Business Leaders: C-Level Executives, Project Managers, and Product Owners will get to explore best practices, methodologies, principles, and practices for achieving ROI.
Engineers, Researchers, Data Practitioners: Will get a better understanding of the challenges, solutions, and ideas being offered via breakouts & workshops on Natural Language Processing, Neural Nets, Reinforcement Learning, Generative Adversarial Networks (GANs), Evolution Strategies, AutoML, and more.
Job Seekers: Will have the opportunity to network virtually and meet over 30+ Top Al Companies.
Ignite what is an Ignite Talk?
Ignite is an innovative and fast-paced style used to deliver a concise presentation.
During an Ignite Talk, presenters discuss their research using 20 image-centric slides which automatically advance every 15 seconds.
The result is a fun and engaging five-minute presentation.
You can see all our speakers and full agenda here