9th Annual Toronto Machine Learning Summit (TMLS) 2025

Proud partner

June 13th: Virtual Talks
June 16th & 17th In-person talks & Networking
June 18th Workshops

Drop us a line: info@torontomachinelearning.com
Sponsorships: faraz@torontomachinelearning.com

17 Tracks Carefully Crafted by a Committee of AI Leaders

You’re not the only one under pressure to deliver AI results without clear playbooks. Join peers navigating the same leadership challenges—from aligning cross-functional technical teams to managing vendor complexity and measuring ROI. Gain insights from deep technical (100-400 level) sessions focused on what’s working in the real world – and what’s to come!

Advanced RAG

We go beyond toy RAG demos to show what it takes to build production systems. From naive baselines to GrapRAG and Prolog-based logic, this track unpacks real failures, benchmarks, and breakthroughs in using LLMs to decode complex domains like government policy. If you like this, you may want to check out the Inference Scaling and Data Preparation & Processing tracks. See the full agenda

From modular DAG-based systems in private equity to multi-function agentic chatbots in banking, this track showcases how real agentic systems are being built and shipped. We’ll cover hard lessons, evals that actually matter, and what it takes to move beyond toy prompts to reliable, production-grade AI agents. See the full agenda

Forget checkbox ethics. This track explores how teams are making AI governance real—baking trust, security, and accountability into workflows. From CI/CD-integrated risk scans to role-based GenAI frameworks, it’s where compliance meets engineering. A complement to the Exec and Agentic Systems tracks. See the full agenda

From Vimeo’s multilingual dubbing engine to agentic chatbots in banking and NLP workflows at Reuters, this track highlights real GenAI systems driving measurable productivity gains. It’s where toolchains meet output, and teams get faster—without cutting corners. A complement to Traditional ML and Data Prep tracks. See the full agenda

Careers

The lines between ML researcher, engineer, and AI specialist are blurring—tooling, constraints, and responsibilities are shifting fast. This track shows how roles are evolving, from system integration to agentic infra, and helps you navigate where to grow next.

This track digs into what it really takes to structure data for next-gen systems. Includes LLMs for imputation, unified lakes over stitched silos, and pipelines that hold up in production.
From OpenAI’s work on memory and personalization to Google DeepMind’s vision for general agents and 3D world models, this track dives into what’s shaping the next era of AI. Expect bold ideas on reasoning, adaptation, and where foundation models go next.
What does it take to ship GenAI where privacy and compliance aren’t optional? From OMERS’ call summarization to Thomson Reuters’ due diligence automation and National Bank’s 100% complaint coverage, this track features real deployments in finance, media, pharma, and more.
Hardware Platforms
This track dives into how hardware choices impact every phase of GenAI—covering training on AMD Instinct GPUs, quantization with Brevitas, and fine-tuning techniques using Megatron-LM and HF-PEFT. Explore how compute, memory, and architecture shape performance, cost, and deployment readiness.
LLMs are powerful—but expensive to serve. This track explores the cutting edge of efficient inference, with techniques like quantization, speculative decoding, and FlashAttention. Hear from NVIDIA and d-Matrix, plus a DeepSeek case study on how FlashMLA pushes the limits of throughput without sacrificing quality.
You don’t need a massive platform team to ship reliable AI. This track highlights lightweight, production-ready MLOps strategies—like Vector Institute’s IaC-based inferencing pipeline for AWS and GCP—built for speed, clarity, and constrained resources.
Not every AI experiment leads to impact. This track surfaces what didn’t work—like the organizational roadblocks Google engineers faced when top-down AI hype met on-the-ground complexity. Exploring Real lessons from false starts, stalled rollouts, and why some teams never make it past POC.
Traditional ML

Not everything needs a transformer. This track highlights enduring ML techniques applied in modern ways—Scotiabank’s causal forecasting, Meta’s work on popularity bias in recommender systems, and Tutte Institute’s rethink of unsupervised learning in high-dimensional spaces. Still relevant, still evolving.

As enterprises shift from prototypes to real deployments, verticalized AI agents are emerging as key infrastructure. This track explores how organizations are using tailored agents to solve domain-specific problems, integrate with complex systems, and deliver measurable outcomes at scale. See the full agenda

This track explores models that process video, audio, and images—like MoCha for cinematic-quality talking characters, Vamba for long-form video understanding, and diffusion-powered personalization for retail. If your AI needs to see, hear, and speak, this is the frontier. See the full agenda

With leaders from CIBC, BMO, Scotiabank, and Layer 6, this track explores how Canada’s top financial institutions are approaching AI in a high-stakes, highly regulated environment. From bridging data readiness gaps to driving ROI at scale, these sessions offer strategic, actionable insights for execs navigating real-world AI adoption. See the full agenda

The era of “smaller, smarter” models is here. This track dives into compact LLMs like SmolLM, efficient training strategies that challenge traditional scaling laws, and practical finetuning techniques using open-source tools. Learn how to train, customize, and optimize models locally. Whether you’re tuning for performance, portability, or cost, this track equips you to build smarter with less. See the full agenda

We'd like to extend a large thank you to the Track Leads and Track Committee Members
Why Attend

A Unique Experience

For 9 years TMLS has hosted a unique blend of cutting-edge research, hands-on workshops, & vetted industry case studies reviewed by the Committee for your team’s expansion & growth.

We emphasize community, learning, and accessibility.

Join Today

Explore the Uncharted Frontiers of Generative AI

Big Ideas Showcase

See groundbreaking innovations and meet the innovators pushing technological boundaries in Gen-AI.

Explore & Network

Explore real-world case studies and cut through the hype, gain valuable insights into the latest advancements, trends and advances around its deployment in production environments in this rapidly evolving field. Network with fellow practitioners and business leaders.

How Does it Work?

Virtual Talks & Workshops

Virtual talks and workshops.

June 13th
9:30 AM – 5:00 PM EST

Up Skill via Workshops

In-person bonus hands-on workshops.

101 College St,
Toronto, ON M5G 1L7

June 18th
9:30 AM – 4:15 PM EST

Network via Community App

Introduce yourself and meet speakers/attendees!

Event app opens
June 2nd

Plot Your Schedule and Attend the Summit!

See 60+ talks, case-studies in various tracks and Industries.

81 Bay St.,
Toronto, ON M5J 0E7

June 16th & 17th
8:45 AM – 4:50 PM EST

Paid Add-On Hackathon

“Building Applications with Open Source LLMs for Fun and Profit” hackathon.

No Summit ticket purchase necessary.

192 Spadina Ave.,
Toronto, ON M5T 2C2
July 11th
9 AM – 8 PM EST
Hike at High Park

Join us at High Park for an afternoon to network and wind-down with fellow attendees.

High Park North Gates

1873 Bloor St W, Toronto, ON M6R 2Z3

July 14th
11 AM – 2 PM EST

Event Speakers

Loubna Ben Allal

Research Engineer, Lead Hugging Face, SmolLM
Talk Title: SmolLM: The Rise of Smol Models

Cong Wei

PhD Student, University of Waterloo
Talk Title: MoCha: Towards Movie-Grade Talking Character Synthesis

Bang Liu

Associate Professor, University of Montreal & Mila
Talk Title: Advances and Challenges in Foundation Agents

Rose Genele

CEO, The Opening Door
Talk Title: Who Owns AI Ethics? Building Accountable Teams in the Age of Machine Learning

Bonnie Li

Research Engineer, Google DeepMind
Talk Title: Towards World Models and General Agents

Amin Atashi

Senior Machine Learning Engineer, The Globe and Mail
Talk Title: Beyond the Hype: Real-World Gen AI Bots in Action

Matthieu Lemay

Co-Founder & AI Strategist, Lemay.ai
Talk Title: Navigating AI Compliance: ISO 42001, the EU AI Act, and the Future of Regulated AI

Soumye Singhal

Research Scientist, NVIDIA
Talk Title: Llama-Nemotron: Efficient Open Reasoning Models

Vikram Appia

Principal Member of Technical Staff AMD
Talk Title: Training on AMD Instinct GPUs: From Pre-training to Fine-tuning and Post-training Strategies

Weiming Ren

PhD Student, University of Waterloo
Talk Title: Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers

W. Ian Douglas

Developer Advocate, Block Open Source Developer Platforms
Workshop: Stop RESTing -- Wake up your AI with MCP

Kaustubh Prabhakar

Member of Technical Staff, OpenAI
Talk Title: Role of Memory and Personalization in AI Systems

Hamza Farooq

CEO & Founder, Traversaal.ai | Adjunct Stanford
Talk Title: Building Agents from Scratch

Shashank Shekhar

Independent Researcher
Talk Title: Case Study: How Does DeepSeek's FlashMLA Speed up Inference

Leland McInnes

Researcher, Tutte Institute
Talk Title: Rethinking Unsupervised Learning

Tanushree Nori

Principal Data Scientist, Vimeo
Talk Title: From One Voice to Forty: Inside Vimeo’s Dubbing Engine

Sponsors

Platinum Sponsor
Sponsors
Community Partners

Interested in Partnering? Email Faraz at faraz@torontomachinelearning.com

Who Attends

Data Practitioners
0 %
Researchers/Academics
0 %
Business Leaders
0 %

2024 Event Demographics

Delegate Attendees per Conference
0 +
Highly Qualified Practitioners
0 %
Currently Working in Industry*
0 %
Attendees Looking for Solutions
0 %
Currently Hiring
0 %
Attendees Actively Job-Searching
0 %

2024 Technical Background

Expert/Researcher
18.5%
Advanced
44.66%
Intermediate
27.37%
Beginner
9.39%

Business Leaders: C-Level Executives, Project Managers, and Product Owners will get to explore best practices, methodologies, principles, and practices for achieving ROI.

Engineers, Researchers, Data Practitioners: Will get a better understanding of the challenges, solutions, and ideas being offered via breakouts & workshops on Natural Language Processing, Neural Nets, Reinforcement Learning, Generative Adversarial Networks (GANs), Evolution Strategies, AutoML, and more.

Job Seekers: Will have the opportunity to network virtually and meet over 30+ Top Al Companies.

Why

TMLS

TMLS is a community response addressing the need to unite academic research, industry opportunities and business strategy in an environment that is safe, welcoming and constructive for those working in the fields of ML/AI.

See our team and learn more about the Toronto Machine Learning Society here.

Tickets

This event has ended
This event is no longer available.

Event Agenda

Talk Title: SmolLM: The Rise of Smol Models

Presenter:
Loubna Ben Allal, Research Engineer, Lead Hugging Face, SmolLM

About the Speaker:
Loubna Ben Allal is a Research Engineer in the Science team at Hugging Face, where she leads the training of small language models (SmolLMs) and their data curation. Previously, she worked on large language models for code and was a core member of the BigCode team behind The Stack datasets and StarCoder models for code generation.

Track: Opensource Model Finetuning

Technical Level: 3/7

Abstract:
On-device language models are revolutionizing AI by making advanced models accessible in resource-constrained environments. In this talk, we will explore the rise of small models and how they are reshaping the AI landscape, moving beyond the era of scaling to ever-larger models. We will also cover SmolLM, a series of compact yet powerful LLMs, focusing on data curation, and ways to leverage these models for on-device applications.

What You’ll Learn:
Small, well-trained language models—built through smart design and thoughtful data curation—can deliver impressive performance, making them ideal for on-device use.

Talk Title: MoCha: Towards Movie-Grade Talking Character Synthesis

Presenter:
Cong Wei, PhD Student, University of Waterloo

About the Speaker:
Cong Wei is a second-year PhD student in Computer Science at the University of Waterloo, supervised by Prof. Wenhu Chen, and a recent research intern at Meta. His research focuses on generative AI and multimodal LLMs, with a particular interest in diffusion models for simulation and digital humans. He has published at top-tier conferences such as ECCV, CVPR, and ICLR.

Track: Multimodal LLMs

Technical Level: 3/7

Abstract:
Recent advancements in video generation have achieved impressive motion realism, yet they often overlook character-driven storytelling—a crucial component for automated film and animation generation. We introduce Talking Characters, a more realistic task that involves generating full-body character animations directly from speech and text. Unlike traditional talking head generation, Talking Characters aims to produce the full portrait of one or more characters, extending beyond the facial region.

In this work, we propose MoCha, the first model of its kind for generating talking characters. To ensure precise synchronization between video and speech, we introduce a speech-video window attention mechanism that effectively aligns audio and visual tokens. To address the lack of large-scale speech-labeled video datasets, we propose a joint training strategy that leverages both speech-labeled and text-labeled videos, significantly improving generalization across diverse character actions.

We also design structured prompt templates with character tags, enabling—for the first time—multi-character conversations with turn-based dialogue. This allows AI-generated characters to engage in context-aware interactions with cinematic coherence. Extensive qualitative and quantitative evaluations, including human preference studies and benchmark comparisons, show that MoCha sets a new standard in AI-generated cinematic storytelling, achieving superior realism, expressiveness, controllability, and generalization.

What You’ll Learn:
Automated filmmaking and digital humans represent the future of storytelling — and MoCha takes a meaningful step toward making that future reality

Talk Title: Advances and Challenges in Foundation Agents

Presenter:
Bang Liu, Associate Professor, University of Montreal & Mila

About the Speaker:
Bang Liu is an Associate Professor in the Department of Computer Science and Operations Research (DIRO) at the University of Montreal (UdeM). He is a member of the RALI laboratory (Applied Research in Computer Linguistics) of DIRO, a member of Institut Courtois of UdeM, an associate member of Mila – Quebec Artificial Intelligence Institute, and a Canada CIFAR AI Chair. His research interests primarily lie in the areas of natural language processing, multimodal & embodied learning, theory and techniques for AGI (e.g., understanding and improving large language models and intelligent agents), and AI for science (e.g., material science, health).

Track: Agents Zero To Hero

Technical Level: 1/7

Abstract:
The advent of large language models (LLMs) has revolutionized artificial intelligence, laying the foundation for sophisticated intelligent agents capable of reasoning, perceiving, and acting across diverse domains. These agents are increasingly central to advancing AI research and applications, yet their design, evaluation, and enhancement pose intricate challenges. In this talk, we will offer a fresh perspective by framing intelligent agents through a modular and cognitive science-inspired lens, bridging AI design with insights from different disciplines to propose a unified framework for understanding their core functionalities and future potential. We will explore the modular design of intelligent agents and present a framework for cognition, perception, action, memory, reward systems, and so on. Then we will discuss each module in detail. Our talk aims to provide a holistic and interdisciplinary perspective for intelligent agent research.

What You’ll Learn:
The concept and architecture of foundation agents.

Talk Title: Who Owns AI Ethics? Building Accountable Teams in the Age of Machine Learning

Presenter:
Rose Genele, CEO, The Opening Door

About the Speaker:
As CEO of The Opening Door, Rose specializes in responsible artificial intelligence integration for investors and enterprise companies. Her work emphasizes the importance of safe and ethical AI—where technologies are designed with fairness, transparency, accountability, and human-centred design in mind. With years of experience in the tech industry, Rose has developed a reputation as an AI transformationalist with a penchant for data, ethics, and futures-forward thinking.
Rose sits on the board of the Canadian Centre for Ethics and Corporate Policy, and Volcano Theatre. She was a 2024 nominee for the RBC Canadian Women Entrepreneur Awards, and recipient of Canada’s Top 100 Black Women to Watch of 2024 Award. Rose is also alumnae of Toronto Metropolitan University, with a Bachelor of Commerce in Law.

Track: AI Ethics And Governance Within The Organization

Technical Level: 2/7

Abstract:
As AI systems move from research labs into real-world applications, the question of responsibility becomes increasingly urgent: who owns AI ethics inside the organization? Is it the data scientists building the models? The legal team writing the policies? Or leadership setting the vision?

This talk dives into the organizational dimension of Responsible AI, unpacking what it takes to move ethical principles off the page and into the workflows of technical teams. Drawing from real-world examples across industries, we’ll explore how leading organizations are structuring cross-functional governance, distributing ethical responsibilities, and embedding accountability into the AI development lifecycle.

Attendees will leave with a clear understanding of:
-How to define and assign roles in Responsible AI initiatives
-Organizational models for AI governance (and when to use them)
-Practical strategies to empower ML professionals to make ethically-informed decisions
-Common pitfalls when ethics becomes a “check-the-box” activity—and how to avoid them

Whether you’re part of an ML research team, a startup shipping AI products, or a mature enterprise scaling AI operations, this session will help you reimagine AI ethics as a team sport, and not a compliance burden.

What You’ll Learn:
1. Governance models for AI within startups vs. enterprises
2. What effective RAI teams look like: roles, rituals, and decision gates
3. How to create a culture of accountability across technical and non-technical staff

Talk Title: Towards World Models and General Agents

Presenter:
Bonnie Li, Research Engineer, Google DeepMind

About the Speaker:
Bonnie is a researcher at Google DeepMind working on frontier AI models and building generally intelligent agents. She worked on the Gemini Thinking models and Genie 2, and is interested in how RL can unlock new capabilities in large-scale models. Previously she worked at Nvidia and co-founded a deep tech startup backed by Khosla Ventures.

Track: Future Trends

Technical Level: 1/7

Abstract:
The dream of general AI agents—capable of learning, adapting, and acting across any task or environment—is within reach. This talk traces a path toward this vision through world models and reinforcement learning. Genie 2 introduces a new paradigm of foundation world models – generating 3D interactive worlds from any text or image. LMAct extends large language models into interactive agents by grounding them in environments. Reinforcement Learning unlocks new capabilities in LLMs by enabling LLMs to autonomously optimize rewards. Together, these developments set the stage for a new generation of agents—capable of reasoning, acting, and self-improving across diverse domains.

What You’ll Learn:
– How world models work
– How RL improves LLMs in specific tasks

Talk Title: Beyond the Hype: Real-World Gen AI Bots in Action

Presenter:
Amin Atashi, Senior Machine Learning Engineer, The Globe and Mail

About the Speaker:
Amin Atashi is a dedicated AI researcher and engineer specializing in generative AI. He focuses on hosting and scaling generative AI solutions that drive innovation in digital media, with a particular emphasis on transforming the news industry. Drawing on his background in optimization models and sensor fusion, Amin develops practical, scalable approaches that address real-world challenges. He has shared his insights at events like the Enterprise AI Summit Canada and the Generative AI Summit Toronto, always striving to make advanced AI solutions more accessible and impactful.

Track: GenAI Deployments In Regulated Industries

Technical Level: 3/7

Abstract:
Deploying generative AI bots in the real world is an exciting yet complex journey. In this talk, I’ll walk through the practical challenges of building, scaling, and maintaining a production-grade GenAI bot—covering everything from prompt engineering and hallucination control to infrastructure, cost management, and monitoring. Drawing from firsthand experience deploying a GenAI bot for a national media platform, I’ll share lessons learned, pitfalls to avoid, and strategies that helped turn a prototype into a reliable, user-facing product. Whether you’re just exploring GenAI or working toward deployment, this talk will offer actionable insights and hard-earned takeaways.

What You’ll Learn:
– Practical strategies for transitioning generative AI bots from prototypes to scalable, production-ready systems.
– An understanding of the key infrastructure challenges—such as hosting, latency, and reliability—and how to overcome them.
– Real-world case studies and lessons learned that illustrate how to manage and optimize AI deployments.
– Insights into the iterative process of refining AI models for robust, user-focused applications.
– How these strategies can be applied across industries to unlock the full potential of generative AI.

Talk Title: Navigating AI Compliance: ISO 42001, the EU AI Act, and the Future of Regulated AI

Presenter:
Matthieu Lemay, Co-Founder & AI Strategist, Lemay.ai

About the Speaker:
Matt Lemay is a leading expert in AI governance, compliance, and machine learning deployment for highly regulated industries. As the Co-Founder of Lemay.ai, he specializes in designing AI systems that align with ISO 42001, the EU AI Act, and other global regulatory frameworks. With deep expertise in finance, healthcare, aerospace, and defence, Matt helps organizations navigate the challenges of safe, transparent, and ethical AI implementation.

Areas of Expertise:
AI Regulation & Governance – Ensuring compliance with ISO 42001, the EU AI Act, and global AI policies.
AI in Regulated Industries – Practical experience deploying AI in high-stakes sectors like MedTech, finance, and defence.
Machine Learning Risk Management – Strategies for bias mitigation, explainability, and security in AI systems.
AI Strategy & Policy – Helping organizations adapt to emerging AI regulations and compliance challenges.

Matt is a certified ISO 42001 lead auditor advocating for responsible AI development. He speaks at global industry events, contributing to discussions on AI ethics, policy, and the future of AI governance. His work has influenced AI adoption policies in North America and Europe, making him a sought-after speaker for leaders, policymakers, and AI practitioners.

Upcoming Engagements:
Swiss Biotech Day 2025 – Speaking on AI compliance in MedTech.
Aeromart Montréal – Discussing AI risk management in aerospace and defence.
Toronto Machine Learning Summit – Presenting on ISO 42001 & the EU AI Act.
Halifax Energy Summit – Exploring AI’s role in energy efficiency and sustainability.
Dubai MedTech Conference – Addressing AI in healthcare and regulatory compliance.

Track: GenAI Deployments In Regulated Industries

Technical Level: 1/7

Abstract:
Navigating AI Compliance – ISO 42001, the EU AI Act, and the Future of Regulated AI

As artificial intelligence becomes increasingly integrated into critical sectors such as healthcare, finance, aerospace, and defence, the demand for standardized AI governance has never been higher. ISO 42001, the first international standard for AI management systems, alongside the EU AI Act, represents a shift towards regulatory oversight that prioritizes safety, transparency, and accountability in AI systems.

In this session, Matt Lemay, Co-Founder of Lemay.ai, will explore the intersection of machine learning and regulatory compliance, offering insights into how businesses can align their AI innovations with evolving global standards. The talk will cover:
– Key provisions of ISO 42001 and the EU AI Act and their implications for AI practitioners.
– Challenges in implementing compliant AI systems, including bias mitigation, security, and ethical considerations.
– Lessons from highly regulated industries, such as aerospace, MedTech, and finance, on deploying AI safely.
– Best practices for AI risk management and governance, ensuring models remain explainable, auditable, and compliant.
– Future trends in AI policy and compliance, helping businesses prepare for the next wave of AI regulation. “

What You’ll Learn:
– Understanding ISO 42001 & the EU AI Act – How these frameworks shape AI governance and compliance.
– Risk & Compliance in AI Development – Addressing bias, transparency, and accountability in machine learning models.
– AI in Regulated Industries – Lessons from deploying AI in healthcare, finance, defence, and aerospace under strict regulations.
– Building Trustworthy AI – Strategies for ensuring safety, security, and ethical AI deployment.
– The Future of AI Regulation – How global policies evolve and what companies must prepare for.

Talk Title: Llama-Nemotron: Efficient Open Reasoning Models

Presenter:
Soumye Singhal, Research Scientist, NVIDIA

About the Speaker:
Soumye Singhal is a Research Scientist at NVIDIA, focusing on LLM post-training and alignment for Nemotron models. Recently he has contributed to the development of Llama-Nemotron reasoning models and Nemotron-Hybrid models. His research focuses on enhancing LLM performance through inference-time compute scaling and preference optimization techniques. Prior to joining NVIDIA, he completed his Master’s at Mila under Aaron Courville and his undergraduate studies at IIT Kanpur.

Track: Inference Scaling

Technical Level: 3/7

Abstract:
This talk introduces Llama-Nemotron, an open-source family of reasoning models delivering state-of-the-art reasoning capabilities with industry-leading inference efficiency. Available in three sizes—Nano (8B), Super (49B), and Ultra (253B)—these models surpass existing open reasoning models such as DeepSeek-R1, offering substantial improvements in inference throughput and memory efficiency.

The presentation will focus primarily on the specialized training methodology underlying these models. This includes a two-stage post-training pipeline: supervised fine-tuning (SFT) using carefully curated synthetic datasets to effectively distill advanced reasoning behaviors, and large-scale reinforcement learning (RL) with curriculum-driven self-learning to enable models to exceed teacher performance.

Additionally, the talk will briefly highlight innovations such as neural architecture search (NAS) for enhanced model efficiency, targeted inference-time optimizations, and a dynamic toggle for switching reasoning on or off, emphasizing their practical importance in real-world enterprise deployments.

What You’ll Learn:
Attendees will learn about effective methods for training powerful reasoning models using reasoning-focused supervised fine-tuning (SFT) and large-scale reinforcement learning (RL), enabling inference-time scaling and efficiency.

Talk Title: Training on AMD Instinct GPUs: From Pre-training to Fine-tuning and Post-training Strategies

Presenters:
Mehdi Rezagholizadeh, Director Software Development, AMD | Vikram Appia, Principal Member of Technical Staff AMD

About the Speakers:
Mehdi is a Principal Member of the Technical Committee at AMD. Before joining AMD, he was a Principal Research Scientist at Huawei Noah’s Ark Lab Canada, where he worked since 2017 and served as the leader of the Canada NLP team for over six years. He focuses on efficient deep learning for NLP, computer vision, and speech, developing streamlined solutions for training, architecture, and inference.

Mehdi holds about 20 patents and has authored over 50 publications in leading conferences and journals, including TACL, NeurIPS, AAAI, ACL, NAACL, EMNLP, EACL, Interspeech, and ICASSP. Additionally, he has actively contributed to the academic and industrial communities by organizing prominent workshops, such as the NeurIPS Efficient Natural Language and Speech Processing (ENLSP) workshops (2021-2024) and by serving on technical committees for ACL, EMNLP, NAACL, and EACL, including as Area Chair and Senior Area Chair for NAACL 2024.

Vikram currently leads the Efficient GenAI team within the AI group at AMD. The teams charter is to enable efficient inferencing and training at scale and release open source models and recipes for community to maximize performance on AMD GPUs. His team focuses on training and inferencing for various GenAI applications across LLM, image/video generation and multi-modal models.

Prior to joining AMD, Vikram spent about 7 years at Rivian Automotive. His team was responsible for development and execution of both the on-board and offboard (auto-labeling) Perception stack for all Rivian vehicle programs. In his prior role at Texas Instruments, Dallas, he was a technical lead on the Perception and Analytics R&D labs.

He received his MS and PhD in ECE from Georgia Institute of Technology, Atlanta. He has authored over 40 US patents, and over 20 articles in refereed conferences and IEEE journals.

Track: Hardware Platforms

Technical Level: 4/7

Abstract:
Large foundation models—spanning large language models (LLMs), vision models, and multi-modal models—have revolutionized both academic research and industrial applications in AI. The computational power of GPUs has played a significant role in the success of these models, impacting their development, training, and inference. As the scope of these foundation models continues to expand, the choice of model architecture, training methods, training data, and hardware computational resources becomes increasingly vital. This presentation explores various training efforts, such as pre-training, fine-tuning, and post-training, using AMD Instinct GPUs. We will delve into our public training dockers, highlighting key features designed to enhance user experience and improve accessibility. The journey begins with pre-training methodologies, where we will review a snapshot of model performance metrics and demonstrate the benefits of leveraging Multi-GPU setups. Next, we will cover fine-tuning solutions, including Full Weight Fine-tuning and Parameter Efficient Tuning (PEFT) using Megatron-LM and HF-PEFT, showcasing how the larger HBM memory of the MI300X can lead to improved training performance and accuracy. Finally, we will address post-training strategies, including the innovative process of distilling multi-head attention (MHA) into more efficient solutions such as Mamba and Multi-Head Latent Attention (MLA) layers, aimed at optimizing model efficiency and deployment readiness. Our talk will provide practical insights and frameworks for implementing these advanced techniques alongside AMD Instinct GPUs.

What You’ll Learn:
Attendees will learn about the entire deep learning workflow from pre-training to post-training while leveraging the capabilities of AMD Instinct GPUs, particularly the MI300X. They will gain insights into effective training strategies, including Full Weight Fine-tuning and Parameter Efficient Tuning (PEFT), and understand the benefits of Multi-GPU setups for enhancing model performance. Additionally, the presentation will cover innovative post-training optimization techniques, such as distilling multi-head attention into more efficient structures, equipping attendees with practical frameworks to apply in their own AI projects.

Talk Title: Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers

Presenter:
Weiming Ren, PhD Student, University of Waterloo

About the Speaker:
Weiming Ren is a second year Ph.D. student at the Cheriton School of Computer Science, University of Waterloo, supervised by Prof. Wenhu Chen. His research interests include designing efficient model architectures and data curation pipelines to enhance large multimodal models (LMMs) for image and video understanding, as well as developing novel algorithms for controllable video generation, image and video editing, and image restoration.

Track: Multimodal LLMs

Technical Level: 3/7

Abstract:
State-of-the-art transformer-based large multimodal models (LMMs) struggle to handle hour-long video inputs due to the quadratic complexity of the causal self-attention operations, leading to high computational costs during training and inference. Existing token compression-based methods reduce the number of video tokens but often incur information loss and remain inefficient for extremely long sequences. In this work, we explore an orthogonal direction to build a hybrid Mamba-Transformer model (VAMBA) that employs Mamba-2 blocks to encode video tokens with linear complexity. Without any token reduction, VAMBA can encode more than 1024 frames (640×360) on a single GPU, while transformer-based models can only encode 256 frames. On long video input, VAMBA achieves at least 50% reduction in GPU memory usage during training and inference, and nearly doubles the speed per training step compared to transformer-based LMMs. Our experimental results demonstrate that VAMBA improves accuracy by 4.6% on the challenging hour-long video understanding benchmark LVBench over prior efficient video LMMs, and maintains strong performance on a broad spectrum of long and short video understanding tasks.

What You’ll Learn:
We develop a novelhybrid Mamba-Transformer model and show that hybrid models can achieve strong results for long video understanding tasks.

Workshop: Stop RESTing -- Wake up your AI with MCP

Presenter:
W. Ian Douglas, Developer Advocate, Block Open Source Developer Platforms

About the Speaker:
Ian was been working in engineering and API architecture for most of his career in tech. He’s been focusing on Developer Advocacy roles and technical education for the past 9 years, and loves to teach new skills at meetups, in workshops and talks. He’s currently working at Block on the Open Source Developer Programs team, and learning all about how AI can interact with APIs and MCP.

Track: Agents Zero To Hero

Technical Level: 2/7

Abstract:
Tired of your AI sleeping on the job? Time to wake it up with some real-world data. In this hands-on workshop, you’ll transform a basic REST API into a powerful tool your AI agent can actually USE. No more static responses – we’re building bridges between AI and APIs using Model Context Protocol (MCP).

What you’ll build:
– Your first MCP wrapper (spoiler: it’s simpler than you think)
– A bridge between web APIs and AI agents that actually works
– A live demo that proves your AI agent isn’t just making things up

The 45-minute coding journey breaks down into:
– Quick dive into MCP’s API superpowers
– Roll up your sleeves and build an MCP wrapper
– Watch your AI agent flex its new API muscles

Bring your laptop with Python installed – we’re coding this together. While we’ll have sample code on hand if you need it, the real fun is in building it yourself. Basic Python skills and REST API knowledge will help, but if you can write a “for” “loop and know what an API endpoint is, you’re ready to roll.

What You’ll Learn:
We’ll be learning how MCP relates to APIs, specifically around RESTful APIs, and how to build an MCP server for a RESTful API, so any AI Agent can access dynamic data.

Talk Title: Role of Memory and Personalization in AI Systems

Presenter:
Kaustubh Prabhakar, Member of Technical Staff, OpenAI

About the Speaker:
Member of Technical Staff at OpenAI working on Memory and Personalization

Track: Future Trends

Technical Level: 2/7

Abstract:
TBD

What You’ll Learn:
Role of Memory and Personalization in AI systems

Talk Title: Building Agents from Scratch

Presenter:
Hamza Farooq, CEO & Founder, Traversaal.ai | Adjunct Stanford

About the Speaker:
Hamza Farooq is an AI Startup founder, educator, researcher, and practitioner with years of experience in cutting-edge AI development. He has worked with global organizations, governments, and top universities, including Stanford and UCLA, to design and deploy state-of-the-art AI solutions. Hamza is the author of Building LLM Applications from Scratch and the founder of Traversaal.ai, a company specializing in Enterprise Knowledge Management and AI guardrails.

Known for his engaging teaching style and deep technical expertise, Hamza has trained thousands of students and professionals to master AI concepts and build production-ready applications.

Track: Agents Zero To Hero

Technical Level: 2/7

Abstract:
This workshop is designed to provide participants with a comprehensive understanding of designing and building AI agents from the ground up. Moving beyond reliance on pre-built frameworks like CrewAI or Autogen, this session emphasizes learning the core mechanics of agent development to enable fully customizable solutions.

Led by Hamza Farooq, a seasoned AI expert and educator, the workshop is both technically rigorous and highly practical. Participants will gain hands-on experience in building intelligent agents capable of autonomous decision-making, task orchestration, and real-world problem-solving.

By the end of the workshop, attendees will walk away with the knowledge and tools needed to develop robust, scalable, and production-grade AI agents tailored to their specific use cases.

What You’ll Learn:
1. Learn Core Fundamentals: Understand the architecture and foundational concepts of AI agents, including reasoning frameworks, decision trees, and multi-agent orchestration.
2. Build Agents from Scratch: Gain hands-on experience in coding AI agents from the ground up, bypassing pre-built frameworks for maximum customization.
3. Implement Advanced Techniques: Explore cutting-edge approaches like semantic chunking, task decomposition, and performance optimization for agents.

Talk Title: Case Study: How Does Deep Seek's FlashMLA Speed up Inference

Presenter:
Shashank Shekhar, Independent Researcher

About the Speaker:
Shashank Shekhar is an independent researcher and consultant who has worked with startups and companies in helping them build and scale data pipelines, machine learning models, as well as evaluation systems. Some of the companies he has consulted for include Vector Institute, Cohere, Erode AI, NextAI, Shell. Prior to this, he was the founder of Dice Health where he built real time speed and language AI solutions for healthcare providers – steering the company from inception to profitability. Even before, he was a researcher on scaling laws, reasoning and interpretability at Meta AI, Vector Institute, and Indian Institute of Science. His research has been cited over 1800+ times, and won various awards including the Best Paper award at NeurIPS 2022.

Track: Inference Scaling

Technical Level: 4/7

Abstract:
DeepSeek has revolutionized the AI landscape with their groundbreaking DeepSeek V-3 and R-1 models. Behind the impressive performance of these models is several ingenious optimizations in both the algorithmic and computational aspects of the attention mechanism. We will set the stage for FlashMLA with an analysis of attention mechanisms in large language models. We’ll examine the algorithmic bottlenecks inherent in traditional attention implementations and introduce DeepSeek’s Multi-Head Latent Attention (MLA) as an algorithmic solution to these scaling challenges.

Building on this algorithmic foundation, we’ll pivot to compute-specific performance constraints that limit attention implementations and consequently, inference speed. We will discuss FlashAttention, a GPU aware algorithm that addresses these limitations through innovative memory access patterns. The presentation culminates in an in-depth look at how DeepSeek ingeniously combines these complementary concepts in their FlashMLA implementation, resulting in dramatically accelerated LLM inference without sacrificing model quality.

What You’ll Learn:
After this talk, attendees will be able to answer the following questions:

1. How does the complexity of attention mechanisms create a fundamental scaling bottleneck as context length increases, and what are the practical implications for training and deployment?
2. What are the tradeoffs between memory footprint and computational efficiency when implementing KV caching, and how do these tradeoffs influence system design decisions?
3. In what ways do various attention variants like MLA fundamentally transform the attention computation paradigm?
4. Why is the distinction between compute-bound versus memory-bound algorithms crucial for optimizing performance on modern GPU architectures, and how does this reframe our approach to attention implementations?
5. How can hardware-aware algorithm design (e.g. for attention) dramatically outperform naive implementations even without changing the mathematical operation being performed?
6. What memory access pattern inefficiencies does online softmax computation elegantly solve that traditional implementations struggle with?
7. How does Flash Attention’s approach to memory management and I/O optimization speed up attention computation while maintaining mathematical equivalence?
8. How does FlashMLA combine the algorithmic benefits of Multi-head Latent Attention with the hardware-optimized implementation techniques of Flash Attention?

Talk Title: Rethinking Unsupervised Learning

Presenter:
Leland McInnes, Researcher, Tutte Institute

About the Speaker:
Leland McInnes is a researcher at the Tutte Institute for Mathematics and Computing in Ottawa, Canada. He works in unsupervised learning, and topological techniques for machine learning. Among his contributions are the UMAP algorithm for dimension reduction, and the accelerated HDBSCAN algorithm for clustering. He maintains many open source data science tools, including UMAP, HDBSCAN, PyNNDescent, DataMapPlot and Toponymy.

Track: Traditional ML

Technical Level: 4/7

Abstract:
Unsupervised learning is a diverse field that includes clustering, dimension reduction, anomaly detection, and density estimation. Many of the algorithms in the field are decades old and designed for low-dimensional tabular data. Now, with neural embeddings unlocking unstructured data, we face a world of high-dimensional data where old assumptions and intuitions do not hold. We’ll look at why classical unsupervised learning problems are still incredibly relevant today, why high-dimensional data breaks many of the standard algorithms, and how we can start to move forward and build new algorithms designed from the ground up for the high-dimensional data of the modern world.

What You’ll Learn:
We need to rethink classical unsupervised learning (clustering, anomaly detection, etc.) in light of high-dimensional data representations from neural embedding methods.

Talk Title: From One Voice to Forty: Inside Vimeo’s Dubbing Engine

Presenter:
Tanushree Nori, Principal Data Scientist, Vimeo

About the Speaker:
Tanushree Nori is a Principal Data Scientist at Vimeo, where for the past 4½ years she’s built LLM-powered features—Video Insights, Chapters, Summaries, Highlights, and multilingual Dubbing—that help viewers unlock more value from every upload. Her earlier work on cloud-storage optimization now saves the platform about $1 million annually and was showcased at Demuxed 2024 in San Francisco. When she isn’t dissecting LLM evals for fun, Tanushree is dancing—bringing her Indian classical foundation into hip-hop and house for a trippy, vibrant fusion.

Track: AI for Productivity Enhancements

Technical Level: 3/7

Abstract:
Imagine every video greeting viewers in their own language—no studio booth, no red-eye caption sprints. Vimeo’s new pipeline turns a single upload into time-locked captions and natural-sounding dubs, almost as fast as the video plays.
1. Gemini Flash 2.0 handles translating transcripts fast enough that you can watch progress in real-time.
2. Careful chain-of-thought prompting coaxes phoneme details of translations, so we can contract roomy German syllables or subtly expand packed Mandarin ones before the subtitles and dubs wander off-beat.
3. Our chunking strategy that pins every subtitle segment to its timestamp, keeping drift under 10 ms.
4. Spot a rare error in the subs? Segment-level re-edit lets you fix a single line; only that slice gets re-translated (and re-dubbed if so wished).
5. A creative and thorough eval framework to run translation experiments.

We’ll share the system design involved, prompting tricks, the timing math, and a few war stories when subs and dubs went rogue—plus the metrics and eval methodology that convinced us the system was ready for production.

What You’ll Learn:
A reasonably detailed how-to on turning a single video into accurate subtitles and dubs in many languages—using LLM prompt tricks, phoneme-aware timing, and a chunk-based pipeline that stays fast, editable, and production-ready.

Who Attends

Attendees
0 +
Data Practitioners
0 %
Researchers/Academics
0 %
Business Leaders
0 %

2023 Event Demographics

Highly Qualified Practitioners
0 %
Currently Working in Industry*
0 %
Attendees Looking for Solutions
0 %
Currently Hiring
0 %
Attendees Actively Job-Searching
0 %

2023 Technical Background

Expert/Researcher
18.5%
Advanced
44.66%
Intermediate
27.37%
Beginner
9.39%

2023 Attendees & Thought Leadership

Attendees
0 +
Speakers
0 +
Company Sponsors
0 +

Business Leaders: C-Level Executives, Project Managers, and Product Owners will get to explore best practices, methodologies, principles, and practices for achieving ROI.

Engineers, Researchers, Data Practitioners: Will get a better understanding of the challenges, solutions, and ideas being offered via breakouts & workshops on Natural Language Processing, Neural Nets, Reinforcement Learning, Generative Adversarial Networks (GANs), Evolution Strategies, AutoML, and more.

Job Seekers: Will have the opportunity to network virtually and meet over 30+ Top Al Companies.

Ignite what is an Ignite Talk?

Ignite is an innovative and fast-paced style used to deliver a concise presentation.

During an Ignite Talk, presenters discuss their research using 20 image-centric slides which automatically advance every 15 seconds.

The result is a fun and engaging five-minute presentation.

You can see all our speakers and full agenda here

Get our official conference app
For Blackberry or Windows Phone, Click here
For feature details, visit Whova