TMLS 6th Annual Conference & Expo 2022 – Register today to ensure workshop seating

TMLS 6th Annual Conference & Expo 2022 – Register here

Toronto Machine Learning Summit

Meet Our Speakers

Advanced Technical / Research

Click Speaker to Read Abstract

Ihab Ilyas

Director, Apple
Talk: Saga: Continuous Construction and Serving of Large Scale Knowledge Graphs

Chip Huyen

CEO, Claypot AI
Talk: Real-time Machine Learning: Architecture and Challenges

Anne Martel

Professor, University of Toronto
Talk: Artificial Intelligence And Digital Pathology: Making The Most of Limited Annotated Data

Varun Raj Kompella

Senior Research Scientist, Sony AI
Talk: Outracing Champion Gran Turismo Drivers With Deep Reinforcement Learning

Bo Chang

Software Engineer, Google Brain
Talk: Latent User Intent Modeling in Recommender Systems

Jinen Setpal

Machine Learning Engineer, DagsHub
Talk: Interpretability Tools are Feedback Loops

Jesse Cresswell

Senior Machine Learning Scientist, Layer 6 AI
Talk: Navigating the Tradeoff Between Privacy and Fairness in ML

Sophia Yang

Senior Data Scientist, Anaconda
Talk: PyScript for Data Science

Steven Waslander

Director, Toronto Robotics and AI Laboratory, University of Toronto
Talk: Where's the Road? The Challenge of Autonomous Driving Perception in Winter

Nestor Maslej

Research Manager, AI Index, Stanford Institute for Human-Centered AI
Talk: 2022 AI Index Report Briefing

Jordan Shaw

Creative Technology Lead, Half Helix
Talk: Tips & Tricks for Intentional Text-to-Image Generation

Ali Sabet

ML Engineer, Cohere
Lightning Talks w/ Open Group Discussion - Text Generation Using Large Language Models

Royal Sequeira

AI Research Scientist, LG Electronics Toronto AI Lab
Lightning Talks w/ Open Group Discussion - Text Generation Using Large Language Models

Piero Molino

CEO & Co-Founder, Predibase
Talk: Declarative Machine Learning Systems: Ludwig & Predibase

Business Strategy

Click Speaker to Read Abstract

Liran Hason

CEO, Aporia
Talk: The Framework for Great ML Products

Danielle Goldfarb

Vice President Global Affairs, Economics and Public Policy, RIWI
Talk: The Risks of Excluding the Disengaged From your Dataset

David Van Bruwaene

Founder and CEO, Fairly AI
Talk: Managing AI and the Associated Risks in a Regulatory Environment

Fion Lee-Madan

Co-Founder and COO, Fairly AI
Talk: Managing AI and the Associated Risks in a Regulatory Environment

Susie Lindsay

Counsel, Law Commission of Ontario
Talk: Managing AI and the Associated Risks in a Regulatory Environment

Shazia Akbar

Lead Machine Learning Engineer, Altis Labs
Panel: Bias and Relevance in Application of ML in Healthcare

Ali Madani

Director of Machine Learning, Cyclica
Panel: Bias and Relevance in Application of ML in Healthcare

Santosh Hariharan

Principal Scientist, Pfizer
Panel: Bias and Relevance in Application of ML in Healthcare

Shiva Amiri

VP, Head of AI and Data Intelligence, Pivotal Life Sciences
Panel: Bias and Relevance in Application of ML in Healthcare

Javier Diaz-Mejia

Head of Data Science, Phenomic AI
Panel: Bias and Relevance in Application of ML in Healthcare

Nicolas Venegas Oliva

Technical Lead of Advanced Analytics, LATAM Airlines
Talk: Data FAILS

Sarah Sun

Director Data Science, Scotiabank
Talk: Data FAILS

Case Studies

Click Speaker to Read Abstract

Muhammad Mamdan

Director, University of Toronto
Talk: Saving Lives with ML: Applications and Learnings

Nikita Medvedev

Director of Advanced Analytics, Coca Cola
Talk: The Application of Mobile Location Data for Vending Machine Site Selection and Revenue Optimization.

Winston Li

Founder, Arima
Talk: The Application of Mobile Location Data for Vending Machine Site Selection and Revenue Optimization.

Shiming Ren

Senior Engineering Manager, Amazon/Twitch
Talk: From Silo to Collaboration – Building Tooling to Support Distributed ML Teams at Twitch

Chen Liu

Senior Engineering Manager, Twitch
Talk: From Silo to Collaboration – Building Tooling to Support Distributed ML Teams at Twitch

Valerii Podymov

Lead Data Scientist, FreshBooks
Talk: Builidng a Fully Automated ML Platform Using Kubeflow and Declarative Approach to Development of End-to-End ML Pipelines

Roshan Isaac

Machine Learning Engineer, FreshBooks
Talk: Builidng a Fully Automated ML Platform Using Kubeflow and Declarative Approach to Development of End-to-End ML Pipelines

Vlad Ryzhkov

Senior Data Engineer, FreshBooks
Talk: Builidng a Fully Automated ML Platform Using Kubeflow and Declarative Approach to Development of End-to-End ML Pipelines

Joey Zhou

Senior Data Engineer, FreshBooks
Talk: Builidng a Fully Automated ML Platform Using Kubeflow and Declarative Approach to Development of End-to-End ML Pipelines

Eric Hart

Staff Data Scientist, Anheuser-Busch
Talk: Optimal Beer Pricing: An Optimization Layer for Price Elasticities

Jawad Ahmed

Staff Applied Scientist, Loblaw Digital
Talk: Solving Product Substitutions, A Big Problem in Grocery E-Commerce – Through Self-Supervised ML

Quoc Tien Au

Data Scientist, Manifest Climate
Talk: Assessing Alignment of Climate Disclosures Using NLP for the Financial Markets

Aysha Cotterill

Data Analyst, Manifest Climate
Talk: Assessing Alignment of Climate Disclosures Using NLP for the Financial Markets

Amish Popli

Data Scientist, SpotHero
Talk: Marketplace experimentation at SpotHero

Hien Luu

Head of Machine Learning Platform, DoorDash
Talk: Scaling & Evolving the Machine Learning Platform at DoorDash

Hanieh Arjmand

ML Researcher, Lydia.ai
Talk: Sensitivity and Interpretability of AI-Models

Spark Tseung

Applied Data Scientist, Lydia.ai
Talk: Sensitivity and Interpretability of AI-Models

Rohit Saha

Applied Research Scientist, Georgian
Talk: Transforming The Retail Industry with Transformers

Kyryl Truskovskyi

Applied Research Scientist, Georgian
Talk: Transforming The Retail Industry with Transformers

Nicolas Venegas Oliva

Technical Lead of Advanced Analytics, LATAM Airlines
Talk: Scaling Advanced Analytics in the Worst Crisis in the Industry Area

Cristóbal Guzmán Wilkendorf

Staff Data Scientist, LATAM Airlines
Talk: Scaling Advanced Analytics in the Worst Crisis in the Industry Area

Serena McDonnell

Lead Data Scientist, Delphia
Talk: The Role of Alternative Data in Investing

Hands-on Workshops

Click Speaker to Read Abstract

Chloe Pou-Prom

Data Scientists, Unity Health Toronto
Workshop: NLP for Healthcare: Challenges With Processing and De-Identifying Clinical Notes

Vaakesan Sundrelingam

Data Scientists, Unity Health Toronto
Workshop: NLP for Healthcare: Challenges With Processing and De-Identifying Clinical Notes

Stefanie Molin

Software Engineer / Data Scientist, Bloomberg
Workshop: Beyond the Basics: Data Visualization in Python

Patricia Thaine

Co-Founder & CEO, Private AI
Workshop: Demystifying De-Identification

Denys Linkov

ML Lead, Voiceflow
Workshop: Iterating on NLP Models from R&D to Production

Dr. Nasim Abdollahi

Machine Learning Researcher, Cyclica
Workshop: Graph Neural Network Modeling in Drug Discovery Using PyTorch

Dr. Farnoosh Khodakarami

Computer Scientist & ML Researcher, Cyclica
Workshop: Graph Neural Network Modeling in Drug Discovery Using PyTorch

Arthur Vitui

Senior Data Scientist Specialist Solution Architect, RedHat Canada
Workshop: Open Source Intelligent Application Delivery on Kubernetes

Jörg Schad

CTO, ArangoDB
Workshop: Graph ML – The Next Level of Machine Learning

Mahmudul Hasan

Lead Data Scientist, TELUS Business Marketing
Workshop: Introduction to NLP & a Step by Step Implementation of a Real World Use Case from TELUS

Akbar Nurlybayev

Co-Founder/VP of Engineering, CentML
Workshop: Train your Models Faster by Learning How to Profile and Apply System-Level Optimizations

Xin Li

Research Engineer, CentML
Workshop: Train your Models Faster by Learning How to Profile and Apply System-Level Optimizations

Yubo Gao

Research Engineer, CentML
Workshop: Train your Models Faster by Learning How to Profile and Apply System-Level Optimizations

Shagun Sodhani

Research Engineer, Meta AI
Workshop: Distributed Training with PyTorch

Eric Hammel

MLOps Engineer, Rocket Science Development
Hands-on Workshop: Introduction to Kubernetes for MLOps

Benjamin Ye

Applied Research Scientists, Georgian
Workshop: Time Series Anomaly Detection with Machine Learning

Angeline Yasodhara

Applied Research Scientists, Georgian
Workshop: Time Series Anomaly Detection with Machine Learning

Eric Huang

Founder & CEO, Advanced Analytics and Research Lab
Workshop: Four Data and Analytics Initiatives and Strategy to Achieve Excellence

Michael Woolfson

Client Lead & Development, Advanced Analytics and Research Lab
Workshop: Four Data and Analytics Initiatives and Strategy to Achieve Excellence

Virtual Workshops

Click Speaker to Read Abstract

Dan Adamson

CEO and Co-Founder, Armilla AI
Workshop: Testing for Fairness in AI HR Systems: Hidden Dangers and Real-World Lessons on How To Detect and Prevent Bias

Bhaskarjit Sarmah

Senior Data Scientist, BlackRock
Workshop: Learning Embedded Representation of the Stock Correlation Matrix using Graph Machine Learning

Danny Chiao

Tech Lead, Tecton/Feast
Workshop: Building a Fraud Detection Model with Feature Stores (Includes Bonus Case Study: How Shopify uses Feast to Manage its ML Features)/span>

Eddie Esquivel

Senior Solutions Architect, Tecton
Workshop: Building a Fraud Detection Model with Feature Stores (Includes Bonus Case Study: How Shopify uses Feast to Manage its ML Features)

Abhin Chhabra

ML Platform Tech Lead, Shopify
Workshop: Building a Fraud Detection Model with Feature Stores (Includes Bonus Case Study: How Shopify uses Feast to Manage its ML Features)

Annie En-Shiun Lee

Assistant Professor, University of Toronto
Workshop: Pre-Trained Multilingual Sequence-to-Sequence Models for NMT: Tips, Tricks and Challenges

Ed Shee

Head of Developer Relations, Seldon
Workshop: An Introduction to Drift Detection

Ashley Scillitoe

Data Science Research Engineer, Seldon
Workshop: An Introduction to Drift Detection

Roxana Sultan

Chief Data Officer and VP, Health, Vector Institute
Workshop: Cancer Image Segmentation

Dr. Benjamin Haibe-Kains

Senior Scientist, University Health Network
Workshop: Cancer Image Segmentation

Jun Ma

Postdoctoral Fellow, Vector Institute
Workshop: Cancer Image Segmentation

Ronald Xie

PhD Candidate, Vector Institute
Workshop: Cancer Image Segmentation

Rex Ma

PhD Candidate, Vector Institute
Workshop: Cancer Image Segmentation

Catalina Herrera

Principle Sales Engineer, Dataiku
Workshop: De-Risk Your AI Efforts by Removing Friction From Your MLOps Processes

Chris Helmus

Senior Sales Engineer, Dataiku
Workshop: De-Risk Your AI Efforts by Removing Friction From Your MLOps Processes

Milan Kordic

Senior Machine Learning Engineer, Tenstorrent
Workshop: Introducing the Tenstorrent Model Zoo

Amber Roberts

Machine Learning Engineer, Arize AI
Workshop: Troubleshooting your ML Models in Production

Tristan Zajonc

Co-Founder, Continual
Workshop: Automating Knowledge Work with Generative AI

Jim Olsen

Chief Technology Officer, ModelOp
Workshop: Building Automated Model Life Cycles To Show Data Science Business Contribution, Minimize the Impact of Regulation and Governance Requirements, and Keep the Freedom of Innovation

Marcelo Litovsky

Director of Sales Engineering, Aporia
Workshop: Observability is Critical to MLOps

James Cameron

Senior AI/ML Solutions Architect, NVIDIA
Workshop: Bringing An AI System From Proof of Concept to Deployment and Beyond

Kallie Levy

Software Engineer, Superwise
Workshop: A Guide to Putting Together a Continuous ML Stack

Rafal Orlowski

Director, Data Science, Scotiabank
Workshop: Launching Scotiabank's Customer Facing Chatbot for a Large Organization: From Cold Start Problem to Implementation

Fabio Dutra Sarti

Senior AI/ML Product Manager, Scotiabank
Workshop: Launching Scotiabank's Customer Facing Chatbot for a Large Organization: From Cold Start Problem to Implementation

Alex Kim

Solutions Engineer, Iterative.ai
Workshop: ML Experimentation with DVC and VS Code

Rajiv Shah

Machine Learning Engineer, Hugging Face
Workshop: Building AI Applications with Transformers

Andrew Jardine

Enterprise Account Executive, Hugging Face
Workshop: Building AI Applications with Transformers

Lightning Talks

Click Speaker to Read Abstract

Sarah Sun

Director Data Science, Scotiabank
Talk: Talk: Model Lifecycle in Banking

Mandy Gu

Engineering Manager, Wealthsimple
Talk: Deploying a Machine Learning Model in under 15 Minutes at Wealthsimple

Bhavani Rao

Technical Product Marketing Manager, Pachyderm
Talk: What To Look For In Your Next ML Pipeline

Oren Razon

Co-Founder & CEO, Superwise
Talk: Retraining won't Fix your Model (Always)

Geoffrey Hunter

Lead Data Scientist, SpotHero
Talk: 7 Questions for Data Scientists

Kai Luo

Senior Applied Scientist, Loblaw Digital
Talk: Recommender Systems at Loblaw Digital

Rex Lam

Director Machine Learning Platform, Autodesk
Talk: The Transformative Role of AI/ML in Heavy Industries

Robinson Garcia

R&D Project Manager & Technology Specialist, Petrobras
Talk: Digital Twin as A Tool for Industrial Asset Management

Amber Roberts

Machine Learning Engineer, Arize AI
Talk: Monitoring Unstructured Models in Production

Andrea Ruotolo

Global Head, Sustainability / ESG, Rockwell Automation
Talk: AI & Sustainability: A $50 Trillion Opportunity

Tickets

Agenda

This agenda is still subject to changes
Ihab Ilyas

Professor, Cheriton School of Computer Science, University of Waterloo
Director, Head of Apple Knowledge Platform, Apple

Ihab Ilyas is a professor in the Cheriton School of Computer Science and the NSERC-Thomson Reuters Research Chair on data quality at the University of Waterloo. He is currently on leave at Apple to lead the Apple Knowledge Platform team. His main research focuses on the areas of Data Science and data management, with special interest in data quality and integration, managing uncertain data, machine learning for data curation, and information extraction. Ihab is a co-founder of Tamr, a startup focusing on large-scale data integration, and the co-founder of inductiv (acquired by Apple), a Waterloo-based startup on using AI for structured data cleaning. He is an ACM Fellow and IEEE Fellow, a recipient of the Ontario Early Researcher Award, a Cheriton Faculty Fellowship, an NSERC Discovery Accelerator Award, and a Google Faculty Award. Ihab was an elected member of the VLDB Endowment board of trustees (2016-2021), elected SIGMOD vice chair (2016-2021), an associate editor of the ACM Transactions of Database Systems (2014-2020), and an associate editor of Foundations of Database Systems. He holds a PhD in computer science from Purdue University, West Lafayette.

Talk: Saga: Continuous Construction and Serving of Large Scale Knowledge Graphs

Abstract: In this talk I present Saga, an end-to-end platform for incremental and continuous construction of large scale knowledge graphs we built at Apple. Saga demonstrates the complexity of building such platform in industrial settings with strong consistency, latency, and coverage requirements. In the talk, I will discuss challenges around the following: building source adapters for ingesting heterogenous data sources; building entity linking and fusion pipelines for constructing coherent knowledge graphs that adhere to a common controlled vocabulary; updating the knowledge graphs with real-time streams; and finally, exposing the constructed knowledge via a variety of services. Graph services include: low-latency query answering; graph analytics; ML-biased entity disambiguation and semantic annotation; and other graph-embedding services to power multiple downstream applications. Saga is used in production at large scale to power a variety of user-facing knowledge features.

What You’ll Learn: Complexity of building large scale knowledge graphs

Track: Technical

Technical Level: 5

Location: Seattle

Talk: Saga: Continuous Construction and Serving of Large Scale Knowledge Graphs

Presenter:
Ihab Ilyas, Professor in the Cheriton School of Computer Science and the NSERC-Thomson Reuters Research Chair on Data quality at the University of Waterloo

About the Speaker:
Ihab Ilyas is a professor in the Cheriton School of Computer Science and the NSERC-Thomson Reuters Research Chair on data quality at the University of Waterloo. He is currently on leave at Apple to lead the Apple Knowledge Platform team. His main research focuses on the areas of Data Science and data management, with special interest in data quality and integration, managing uncertain data, machine learning for data curation, and information extraction. Ihab is a co-founder of Tamr, a startup focusing on large-scale data integration, and the co-founder of inductiv (acquired by Apple), a Waterloo-based startup on using AI for structured data cleaning. He is an ACM Fellow and IEEE Fellow, a recipient of the Ontario Early Researcher Award, a Cheriton Faculty Fellowship, an NSERC Discovery Accelerator Award, and a Google Faculty Award. Ihab was an elected member of the VLDB Endowment board of trustees (2016-2021), elected SIGMOD vice chair (2016-2021), an associate editor of the ACM Transactions of Database Systems (2014-2020), and an associate editor of Foundations of Database Systems. He holds a PhD in computer science from Purdue University, West Lafayette.

Which talk track does this best fit into?
Technical / Research

Technical level of your talk?
(Technical level: 5/7)

Are there any industries (in particular) that are relevant for this talk?
Information Technology & Service

What are the main core message (learning) you want attendees to take away from this talk?
Complexity of building large scale knowledge graphs.

Abstract of Talk:
In this talk I present Saga, an end-to-end platform for incremental and continuous construction of large scale knowledge graphs we built at Apple. Saga demonstrates the complexity of building such platform in industrial settings with strong consistency, latency, and coverage requirements. In the talk, I will discuss challenges around the following: building source adapters for ingesting heterogenous data sources; building entity linking and fusion pipelines for constructing coherent knowledge graphs that adhere to a common controlled vocabulary; updating the knowledge graphs with real-time streams; and finally, exposing the constructed knowledge via a variety of services. Graph services include: low-latency query answering; graph analytics; ML-biased entity disambiguation and semantic annotation; and other graph-embedding services to power multiple downstream applications. Saga is used in production at large scale to power a variety of user-facing knowledge features.

Chip Huyen

CEO, Claypot AI

Chip Huyen is a co-founder of Claypot AI, a platform for real-time machine learning. Previously, she was with Snorkel AI and NVIDIA. She teaches CS 329S: Machine Learning Systems Design at Stanford. She’s the author of Designing Machine Learning Systems, an Amazon bestseller in AI. She has also written four bestselling Vietnamese books.

Talk: Real-time Machine Learning: Architecture and Challenges

Abstract: Fresh data beats stale data for machine learning applications. This talk discusses the value of fresh data as well as different types of architecture and challenges of online prediction.

What You’ll Learn: Fresh data beats stale data for machine learning applications

Track: Technical

Technical Level: 5

Location: San Franciso

Talk: Real-time Machine Learning: Architecture and Challenges

Presenter:
Chip Huyen, CEO, Claypot AI

About the Speaker:
Chip Huyen is a co-founder of Claypot AI, a platform for real-time machine learning. Previously, she was with Snorkel AI and NVIDIA. She teaches CS 329S: Machine Learning Systems Design at Stanford. She’s the author of Designing Machine Learning Systems, an Amazon bestseller in AI. She has also written four bestselling Vietnamese books.

Which talk track does this best fit into?
Technical / Research

Technical level of your talk?
(Technical level: 5/7)

Are there any industries (in particular) that are relevant for this talk?
Banking & Financial Services, Computer Software, Information Technology & Service, Insurance, Marketing & Advertising

What are the main core message (learning) you want attendees to take away from this talk?
Fresh data beats stale data for machine learning applications

Abstract of Talk:
Fresh data beats stale data for machine learning applications. This talk discusses the value of fresh data as well as different types of architecture and challenges of online prediction.

Anne Martel

Professor, University of Toronto

Anne Martel is a Professor in Medical Biophysics at the University of Toronto, the Tory Family Chair in Oncology at Sunnybrook Research Institute, and a Faculty Affiliate at the Vector Institute, Toronto. Her research program is focused on medical image and digital pathology analysis, particularly on the development of self-supervised and weakly supervised methods for segmentation, diagnosis, and prediction/prognosis. In 2006 she co-founded Pathcore, a software company developing complete workflow solutions for digital pathology.

Dr Martel is an active member of the medical image analysis community and is a fellow of the MICCAI Society which represents engineers and computer scientists working in this field. She has served as board member of MICCAI and is currently on the editorial board of Medical Image Analysis, on of the leading journals in the field.

Talk: Artificial Intelligence And Digital Pathology: Making The Most of Limited Annotated Data

Abstract: Obtaining large datasets with detailed annotations for medical imaging AI projects is a time consuming and expensive process as it usually requires the input of expert radiologists and pathologists. Collecting data to train outcome prediction models is even more challenging as the number of patients with both imaging and follow up data may be small, and only weak labels are available.

This talk will describe several semi-supervised and self-supervised approaches which can make more efficient use of small and/or weakly labelled datasets. The focus will be on digital pathology but the methods described are applicable any medical imaging modality.

What You’ll Learn: Self-supervision and smart sampling strategies are essential in digital pathology

Track: Advanced Technical/Research

Technical Level: 6

Location: Toronto

Talk: Artificial Intelligence And Digital Pathology: Making The Most of Limited Annotated Data

Presenter:
Anne Martel, Professor, University of Toronto

About the Speaker:
Anne Martel is a Professor in Medical Biophysics at the University of Toronto, the Tory Family Chair in Oncology at Sunnybrook Research Institute, and a Faculty Affiliate at the Vector Institute, Toronto. Her research program is focused on medical image and digital pathology analysis, particularly on the development of self-supervised and weakly supervised methods for segmentation, diagnosis, and prediction/prognosis. In 2006 she co-founded Pathcore, a software company developing complete workflow solutions for digital pathology.

Dr Martel is an active member of the medical image analysis community and is a fellow of the MICCAI Society which represents engineers and computer scientists working in this field. She has served as board member of MICCAI and is currently on the editorial board of Medical Image Analysis, on of the leading journals in the field.

Which talk track does this best fit into?
Advanced Technical / Research

Technical level of your talk?
(Technical Level: 6/7)

What you’ll learn:
Self-supervision and smart sampling strategies are essential in digital pathology

Abstract of Talk:
Obtaining large datasets with detailed annotations for medical imaging AI projects is a time consuming and expensive process as it usually requires the input of expert radiologists and pathologists. Collecting data to train outcome prediction models is even more challenging as the number of patients with both imaging and follow up data may be small, and only weak labels are available.

This talk will describe several semi-supervised and self-supervised approaches which can make more efficient use of small and/or weakly labelled datasets. The focus will be on digital pathology but the methods described are applicable any medical imaging modality.

Varun Raj Kompella

Senior Research Scientist, Sony AI

Varun Kompella is currently a senior research scientist at Sony AI. He earned his master’s of science degree in informatics with a specialization in graphics, vision and robotics from Institut Nationale Polytechnique de Grenoble (INRIA Grenoble), and a Ph.D degree from Università della Svizzera Italiana (IDSIA Lugano), Switzerland, working with Prof. Juergen Schmidhuber. In his thesis work he developed algorithms that use the slowness principle for driving exploration in reinforcement learning agents. After completing his Ph.D., he worked as a postdoctoral researcher at the Institute for Neural Computation (INI), Germany. His research contributions led to several patents, publications in peer-reviewed journals and conference proceedings.

Talk: Outracing Champion Gran Turismo Drivers With Deep Reinforcement Learning

Abstract: Many potential applications of artificial intelligence involve making real-time decisions in physical systems while interacting with humans. Automobile racing represents an extreme example of these conditions; drivers must execute complex tactical manoeuvres to pass or block opponents while operating their vehicles at their traction limits. Racing simulations, such as the PlayStation game Gran Turismo, faithfully reproduce the non-linear control challenges of real race cars while also encapsulating the complex multi-agent interactions. Here we describe how we trained agents for Gran Turismo that can compete with the world’s best e-sports drivers. We combine state-of-the-art, model-free, deep reinforcement learning algorithms with mixed-scenario training to learn an integrated control policy that combines exceptional speed with impressive tactics.

In addition, we construct a reward function that enables the agent to be competitive while adhering to racing’s important, but under-specified, sportsmanship rules. We demonstrate the capabilities of our agent, Gran Turismo Sophy, by winning a head-to-head competition against four of the world’s best Gran Turismo drivers. By describing how we trained championship-level racers, we demonstrate the possibilities and challenges of using these techniques to control complex dynamical systems in domains where agents must respect imprecisely defined human norms.

What You’ll Learn: We demonstrate the possibilities and challenges of using deep RL techniques to control complex dynamical systems in domains such as Gran Turismo where agents must respect imprecisely defined human norms.

Track: Technical / Research

Technical Level: 7

Location: Ottawa

Talk: Outracing Champion Gran Turismo Drivers With Deep Reinforcement Learning

Presenter:
Varun Raj Kompella, Senior Research Scientist, Sony AI

About the Speaker:
Varun Kompella is currently a senior research scientist at Sony AI. He earned his master’s of science degree in informatics with a specialization in graphics, vision and robotics from Institut Nationale Polytechnique de Grenoble (INRIA Grenoble), and a Ph.D degree from Università della Svizzera Italiana (IDSIA Lugano), Switzerland, working with Prof. Juergen Schmidhuber. In his thesis work he developed algorithms that use the slowness principle for driving exploration in reinforcement learning agents. After completing his Ph.D., he worked as a postdoctoral researcher at the Institute for Neural Computation (INI), Germany. His research contributions led to several patents, publications in peer-reviewed journals and conference proceedings.

Which talk track does this best fit into?
Technical / Research

Technical level of your talk?
(Technical Level: 7/7)

What you’ll learn:
We demonstrate the possibilities and challenges of using deep RL techniques to control complex dynamical systems in domains such as Gran Turismo where agents must respect imprecisely defined human norms.

Abstract of Talk:
Many potential applications of artificial intelligence involve making real-time decisions in physical systems while interacting with humans. Automobile racing represents an extreme example of these conditions; drivers must execute complex tactical manoeuvres to pass or block opponents while operating their vehicles at their traction limits. Racing simulations, such as the PlayStation game Gran Turismo, faithfully reproduce the non-linear control challenges of real race cars while also encapsulating the complex multi-agent interactions. Here we describe how we trained agents for Gran Turismo that can compete with the world’s best e-sports drivers. We combine state-of-the-art, model-free, deep reinforcement learning algorithms with mixed-scenario training to learn an integrated control policy that combines exceptional speed with impressive tactics.

In addition, we construct a reward function that enables the agent to be competitive while adhering to racing’s important, but under-specified, sportsmanship rules. We demonstrate the capabilities of our agent, Gran Turismo Sophy, by winning a head-to-head competition against four of the world’s best Gran Turismo drivers. By describing how we trained championship-level racers, we demonstrate the possibilities and challenges of using these techniques to control complex dynamical systems in domains where agents must respect imprecisely defined human norms.

Bo Chang

Software Engineer, Google Brain

Bo Chang is a software engineer at Google Brain, based in Toronto, Canada. Prior to that, he was a machine learning researcher at Borealis AI. He finished his Ph.D. in statistics at the University of British Columbia.

Talk: Latent User Intent Modeling in Recommender Systems

Abstract: The current sequential recommender systems mainly rely on users’ item-level interaction history to capture topical interests and lacks a high-level understanding of user intent. It is challenging to explicitly define and enumerate all possible user intents. We propose to use latent variable models to capture user intents as latent variables through encoding and decoding user behavior signals, with an application to a large industrial recommender system.

What You’ll Learn: How to better model user intent in recommender systems using a latent variable model.

Track: Advanced Technical/ Research

Technical Level: 7

Location: Toronto

Talk: Latent User Intent Modeling in Recommender Systems

Presenter:
Bo Chang, Software Engineer, Google Brain

About the Speaker:
Bo Chang is a software engineer at Google Brain, based in Toronto, Canada. Prior to that, he was a machine learning researcher at Borealis AI. He finished his Ph.D. in statistics at the University of British Columbia.

Which talk track does this best fit into?
Advanced Technica l/ Research

Technical level of your talk?
(Technical Level: 7/7)

What you’ll learn:
How to better model user intent in recommender systems using a latent variable model.

Abstract of Talk:
The current sequential recommender systems mainly rely on users’ item-level interaction history to capture topical interests and lacks a high-level understanding of user intent. It is challenging to explicitly define and enumerate all possible user intents. We propose to use latent variable models to capture user intents as latent variables through encoding and decoding user behavior signals, with an application to a large industrial recommender system.

Jinen Setpal

Machine Learning Engineer, DagsHub

I’m a second-year undergraduate studying Data Science at Purdue University. I also work part-time as a Machine Learning Engineer @ DagsHub.

I love research – especially within academia! My interests lie firmly within Machine Vision, NLP & Cybersecurity; so far, I’ve published some peer-reviewed papers and have pending patents within these domains.

Talk: Interpretability Tools are Feedback Loops

Abstract: Fundamentally – Machine Learning as a field is designed to emulate the way humans think; hence, *neural* networks. When we train our models, we use optimizers and loss functions to measure their success. While these functions make sense mathematically, they are far from intuitive or explaining what happened behind the scenes. It’s hard to pick the correct functions, and performing huge grid searches to hyperparameter tune at scale is as logical as bruteforcing an SHA-256 hash.

On the other hand – Interpretability techniques can’t really be used in a training context but are intuitive in helping us understand how a given model interprets a set of data.

This talk aims to bridge the gap between the two, connecting them within a single training loop to maximize training effectiveness without disproportionately increasing compute or training time. Making training intuitive to how humans learn should help develop models that actually work, without resorting to “useless” training.

I aim to showcase – with a practical demonstration – learning techniques to build feedback loops wherein interpretability is used to better optimize a training sequence. I also aim to discuss how this carries forward to complex architectures, and a potential approach for their relevant implementation.

Structurally: the talk would provide an overview on machine interpretability, provide a brief overview on optimizers and loss functions before jumping into the implementation walkthrough of a case study. The case study uses TensorFlow, but can generally be applied to any desired framework.

What You’ll Learn: If, by the end my presentation, attendees are able to identify techniques for applying the proposed approach within their internal systems, or find themselves motivated to further research the ideas presented, I’d consider the talk a success.

Track: Technical

Technical Level: 6

Location: West Lafayette, Indiana, United States

Talk: Interpretability Tools are Feedback Loops

Presenter:
Jinen Setpal, Machine Learning Engineer, DagsHub

About the Speaker:
I’m a second-year undergraduate studying Data Science at Purdue University. I also work part-time as a Machine Learning Engineer @ DagsHub.

I love research – especially within academia! My interests lie firmly within Machine Vision, NLP & Cybersecurity; so far, I’ve published some peer-reviewed papers and have pending patents within these domains.

Which talk track does this best fit into?
Technical / Research

Technical level of your talk?
(Technical level: 6/7)

Are there any industries (in particular) that are relevant for this talk?
Computer Software, Researchers within Academia

Who is this presentation for?
Data Scientists/ ML Engineers, ML Engineers, Researchers

What you’ll learn:
Most guides on Interpretable Machine Learning, while focusing on the seemingly black-box nature of models, fail to address the challenge of it only working post-training. It’s addressed as a final check tool to ensure it isn’t learning tragically differently from what the researcher expects, but not directly as part of the training feedback loop. I aim to address this.

What are the main core message (learning) you want attendees to take away from this talk?
If, by the end my presentation, attendees are able to identify techniques for applying the proposed approach within their internal systems, or find themselves motivated to further research the ideas presented, I’d consider the talk a success.

Pre-requisite Knowledge:
It’s a technical presentation, and requires participants to be familiar on a high level with model optimization techniques and machine interpretability. They should also be very familiar with the standard classification pipeline of a feedforward neural network.

What is unique about this speech, from other speeches given on the topic?
I work extensively on ML Reproducibility. In fact, it is the fundamental work done within my research grant funded by Google. Time and again, I find that papers tend to document the end-result and the problem statement extensively, but not everything in between. That’s where the real learning takes place, understanding what DIDN’T work and WHY.

This is rarely ever documented. Papers list working parameters, both functions and hyperparameters, but fail to explain why. I’ve struggled a lot with this and hope to relay potential solutions through the extent of my presentation.

Abstract of Talk:
Fundamentally – Machine Learning as a field is designed to emulate the way humans think; hence, *neural* networks. When we train our models, we use optimizers and loss functions to measure their success. While these functions make sense mathematically, they are far from intuitive or explaining what happened behind the scenes. It’s hard to pick the correct functions, and performing huge grid searches to hyperparameter tune at scale is as logical as bruteforcing an SHA-256 hash.

On the other hand – Interpretability techniques can’t really be used in a training context but are intuitive in helping us understand how a given model interprets a set of data.

This talk aims to bridge the gap between the two, connecting them within a single training loop to maximize training effectiveness without disproportionately increasing compute or training time. Making training intuitive to how humans learn should help develop models that actually work, without resorting to “useless” training.

I aim to showcase – with a practical demonstration – learning techniques to build feedback loops wherein interpretability is used to better optimize a training sequence. I also aim to discuss how this carries forward to complex architectures, and a potential approach for their relevant implementation.

Structurally: the talk would provide an overview on machine interpretability, provide a brief overview on optimizers and loss functions before jumping into the implementation walkthrough of a case study. The case study uses TensorFlow, but can generally be applied to any desired framework.

Can you suggest 2-3 topics for post-discussion?
Information Retreival, Explainable AI, relevance of the above in product development

Jesse Cresswell

Senior Machine Learning Scientist, Layer 6 AI

Jesse is a Senior Machine Learning Scientist at Layer 6 AI within TD, and is the Team Lead for Credit Risk. His applied work centers on building machine learning models in high risk and highly regulated domains. Jesse leads research on privacy enhancing technologies for machine learning including topics of Federated Learning and Differential Privacy.

Talk: Navigating the Tradeoff Between Privacy and Fairness in ML

Abstract: As machine learning becomes more widespread throughout society, aspects including data privacy and fairness must be carefully considered, and are crucial for deployment in highly regulated industries. Unfortunately, the application of privacy enhancing technologies often worsens unfair tendencies in models. In this talk we address the intersection of privacy and fairness in machine learning, and offer research-based solutions for navigating the tradeoffs.

What You’ll Learn: Applying privacy enhancing technologies can increase bias and unfairness in ML models. Practitioners need to consider the intersection of these important ethical ideas.

Track: Technical

Technical Level: 5

Location: East York

Talk: Navigating the Tradeoff Between Privacy and Fairness in ML

Presenter:
Jesse Cresswell, Senior Machine Learning Scientist, Layer 6 AI

About the Speaker:
Jesse is a Senior Machine Learning Scientist at Layer 6 AI within TD, and is the Team Lead for Credit Risk. His applied work centers on building machine learning models in high risk and highly regulated domains. Jesse leads research on privacy enhancing technologies for machine learning including topics of Federated Learning and Differential Privacy.

Which talk track does this best fit into?
Technical

Technical level of your talk?
(Technical Level: 5/7)

What you’ll learn:
Applying privacy enhancing technologies can increase bias and unfairness in ML models. Practitioners need to consider the intersection of these important ethical ideas.

Abstract of Talk:
As machine learning becomes more widespread throughout society, aspects including data privacy and fairness must be carefully considered, and are crucial for deployment in highly regulated industries. Unfortunately, the application of privacy enhancing technologies often worsens unfair tendencies in models. In this talk we address the intersection of privacy and fairness in machine learning, and offer research-based solutions for navigating the tradeoffs.

Sophia Yang

Senior Data Scientist, Anaconda

Sophia Yang is a Senior Data Scientist at Anaconda, Inc., where she uses data science to facilitate decision-making for various departments across the company. She volunteers as a Project Incubator at NumFOCUS to help Open Source Scientific projects grow. She is also the author of multiple Python open-source libraries such as condastats, cranlogs, PyPowerUp, intake-stripe, and intake-salesforce. She holds an M.S. in Statistics and Ph.D. in Educational Psychology from The University of Texas at Austin.

Talk: PyScript for Data Science

Abstract: Are you a data scientist or a developer who mostly uses Python? Are you jealous of developers who write Javascript code and build fancy websites in a browser? How nice would it be if we can write websites in Python? PyScript makes it possible! PyScript allows users to write Python in the browser. In this talk, I will introduce PyScript and discuss what does PyScript mean for data scientists, how PyScript might change the way data scientists work, and how PyScript can be incorporated into the data science workflow.

What You’ll Learn: Use PyScript to run Python in Your HTML

Track: Technical

Technical Level: 4

Location: Austin

Talk: PyScript for Data Science

Presenter:
Sophia Yang, Senior Data Scientist, Anaconda

About the Speaker:
Sophia Yang is a Senior Data Scientist at Anaconda, Inc., where she uses data science to facilitate decision-making for various departments across the company. She volunteers as a Project Incubator at NumFOCUS to help Open Source Scientific projects grow. She is also the author of multiple Python open-source libraries such as condastats, cranlogs, PyPowerUp, intake-stripe, and intake-salesforce. She holds an M.S. in Statistics and Ph.D. in Educational Psychology from The University of Texas at Austin.

Which talk track does this best fit into?
Technical / Research

Technical level of your talk?
(Technical level: 3/7)

Are there any industries (in particular) that are relevant for this talk?
Computer Software, Information Technology & Service

Who is this presentation for?
Senior Business Executives, Product Managers, Data Scientists/ ML Engineers and High-level Researchers, Product Managers, Data Scientists/ ML Engineers

What you’ll learn:
PyScript was announced early this year. There is not much tutorial and content online yet.

Pre-requisite Knowledge:
Knowledge of Python is recommended

What is unique about this speech, from other speeches given on the topic?
I will introduce PyScript from a data science perspective.

Abstract of Talk:
Are you a data scientist or a developer who mostly uses Python? Are you jealous of developers who write Javascript code and build fancy websites in a browser? How nice would it be if we can write websites in Python? PyScript makes it possible! PyScript allows users to write Python in the browser. In this talk, I will introduce PyScript and discuss what does PyScript mean for data scientists, how PyScript might change the way data scientists work, and how PyScript can be incorporated into the data science workflow.

Can you suggest 2-3 topics for post-discussion?
Visualization, Model Deployment

Talk: Where's the Road? The Challenge of Autonomous Driving Perception in Winter

Presenter:
Steven Waslander Professor, Institute for Aerospace Studies / Director, Toronto Robotics and AI Laboratory, University of Toronto

About the Speaker:
Prof. Steven Waslander is a leading authority on autonomous aerial and ground vehicles, including multirotor drones and autonomous driving vehicles. He received his B.Sc.E.in 1998 from Queen’s University, his M.S. in 2002 and his Ph.D. in 2007, both from Stanford University in Aeronautics and Astronautics. He founded and directed the Waterloo Autonomous Vehicle Laboratory (WAVELab) in 2008-2018, and the Toronto Robotics and Artificial Intelligence Laboratory (TRAILab) at the University of Toronto from 2018 onward.
Prof. Waslander’s work on autonomous vehicles has resulted in the Autonomoose, the first autonomous vehicle created at a Canadian University to drive on public roads. His insights into autonomous driving have been featured in the Globe and Mail, Toronto Star, National Post, and the Rick Mercer Report. He has over 160 publications and host the Self-Driving Car Specialization on Coursera, which has accumulated over 150,000 learners worldwide since 2019.

Which talk track does this best fit into?
Technical

Technical level of your talk?
(Technical level: 6/7)

What you’ll learn:
That winter is harder than clear weather, but that we can still build safe self-driving cars for any weather condition, if we take the time to work through the added challenges.

Abstract of Talk:
Autonomous driving solutions are steadily progressing toward real-world deployments, but most companies are focused on driving in clear weather days in benign climates. Our work on exposing the challenges of Canadian winters for perception tasks has led to the Canadian Adverse Driving Conditions Dataset, and to multiple advances in all-weather autonomy that set the stage for more complete dominion of robotics systems over sensor degradation due to precipitation and accumulation. In this talk, I’ll highlight some of the worst problems that arise for autonomous systems in Canada, and lay out our plans for WinTOR, a new University of Toronto research program aimed at helping self-driving vehicles extend their range to our roadways year round.

Talk: 2022 AI Index Report Briefing

Presenter:
Nestor Maslej, Research Manager, AI Index, Stanford Institute for Human-Centered AI

About the Speaker:
Nestor Maslej is a Research Manager at Stanford’s Institute for Human-Centered Artificial Intelligence (HAI). In this position, he manages the AI Index and Global AI Vibrancy Tool. Nestor also leads research projects that study AI in the context of technical advancement, ethical concerns and policymaking. In developing tools that track the advancement of AI, Nestor hopes to make the AI space more accessible to policymakers.

Nestor also speaks frequently about trends in AI. He has delivered presentations about the AI Index to teams at the World Economic Forum, Centre for Data Ethics and Innovation and Global Arena Research Institute. Nestor has also testified to the Canadian Parliament’s House of Commons Standing Committee on Access to Information, Privacy and Ethics on the use and impact of facial recognition technology in Canada.

Prior to joining HAI, Nestor worked in Toronto as an analyst in several startups. He graduated from the University of Oxford in 2021 with an MPhil in Comparative Government, where he used machine learning methodologies to study the Canadian Indian Residential schooling system and Harvard College in 2017 with an A.B. in Social Studies.

Which talk track does this best fit into?
Technical

Technical level of your talk?
(Technical level: 3/7)

What you’ll learn:
That AI is here in a way that it was not before, and that as a society, we need to think critically about the role AI should play in our lives.

Abstract of Talk:
Learn about some of the main trends in AI, as told to you by the 2022 AI Index Report. The AI Index is one of the most widely read annual reports on trends in AI and has informed AI policymakers and industry leaders across the globe. This presentation covers some of the main trends explored in the report, namely trends in areas such as research and development, technical advancement, ethics, economics, policy and education.

Talk: Tips & Tricks for Intentional Text-to-Image Generation

Presenter:
Jordan Shaw, Creative Technology Lead, Half Helix

About the Speaker:
Jordan Shaw is an artist, and creative technologist raised and is currently based in Toronto, Canada. He grew up in Scarborough and received his MFA from OCAD University’s Digital Futures program leading to his thesis being exhibited during Vector Festival at InterAccess. Before that, he completed his undergraduate degrees at Carleton University and Algonquin College, where his final installation was exhibited and recognized during ACM SIGGRAPH.

His work is related to exposing the hidden and unseen aspects of technology and the digital environment around us. The manifestation of this work tries to visualize the hidden interactions between people and technology, data collection and these digital systems trying to understand the physical world.

Jordan has exhibited internationally in Australia, Canada, Germany, Spain and the United States of America.

Which talk track does this best fit into?
Technical

Technical level of your talk?
(Technical level: 4/7)

What you’ll learn:
The exploration, evolution and progression of ML throughout the last couple of years through a creative lens.

Abstract of Talk:
A creative evolution of ML in the arts. The speaker started producing artwork and exhibiting ML pieces in 2015. The reflective perspective of progression in ML and AI while observing creative model biases provides a unique perspective for the future of ML in creative fields and may influence popular culture down the road.

Lightning Talks w/ Open Group Discussion - Text Generation Using Large Language Models

Presenters:
Ali Sabet, ML Engineer, Cohere & Royal Sequeira, AI Research Scientist, LG Electronics Toronto AI Lab

About the Speakers:
Ali is a Machine Learning Engineer at Cohere, working on both text and image generation. He’s built viral apps, made fundamental contributions to instruct training rolled out in Cohere’s text products, and is leading image generation capabilities at Cohere.

Royal Sequeira is an AI Research Scientist at LG Toronto AI Lab. He did his Masters in Computer Science from University of Waterloo. In the past, he has worked at Microsoft Research India, Ada Support Inc. in Toronto. In 2018, he founded Sushiksha, a mentorship organization that helps hundreds of students across India.

Which talk track does this best fit into?
Technical

Talk: Declarative Machine Learning Systems: Ludwig & Predibase

Presenter:
Piero Molino, CEO & Co-Founder, Predibase

Which talk track does this best fit into?
Technical

Technical level of your talk?
(Technical level: 5/7)

Abstract of Talk:
Declarative Machine Learning Systems are a new trend that marries the flexibility of DIY machine learning infrastructure and the simplicity of AutoML solutions. In this talk we will discuss about Ludwig, the open source declarative deep learning framework, and Predibase, an enterprise grade solution based on it.

Talk: The Framework for Great ML Products

Presenter:
Liran Hason, CEO, Aporia

About the Speaker:
Liran Hason is the Co-Founder and CEO of Aporia, a full-stack ML observability platform used by Fortune 500 companies and data science teams across the world to ensure responsible AI. Prior to founding Aporia, Liran was an ML Architect at Adallom (acquired by Microsoft), and later an investor at Vertex Ventures. Liran created Aporia after seeing first-hand the effects of AI without guardrails. In 2022, Forbes named Aporia as the “Next Billion-Dollar Companies”.

Which talk track does this best fit into?
Business Strategy

Technical level of your talk?
(Technical level: 4/7)

What you’ll learn:
We’ll get down to the core reasons, and discuss a framework for building successful ML products, achieving data science-business alignment, and accomplishing trust and model value recognition.

Abstract of Talk:
F1 score, OKRs, Precision, KPIs – are more related than you’d think.
As an ML Engineer and business leader Liran will talk about the frustration that both Data Scientists and business stakeholders experience with ML projects.

Danielle Goldfarb

Vice President and General Manager of Global Affairs, Economics and Public Policy, RIWI (Real-Time Interactive Worldwide Intelligence)

Opening speaker, Tedx Toronto: The Smartest Way to Predict the Future; Developing a U of Toronto course on new data tools for global affairs / public policy; see also linkedin

Talk: The Risks of Excluding the Disengaged From your Dataset

Abstract: Analysts use machine learning tools to analyze the vast amounts of data now available. Big data is appealing, but even the best tools and techniques will give you noise if the underlying data aren’t robust and inclusive. To reliably predict elections, correctly anticipate consumer demand or to accurately predict the trajectory of a pandemic, leaders need to ask themselves who is (and isn’t) included in their dataset.

What You’ll Learn: There is a huge emphasis on tools to analyze big data but we need to spend at least as much time on the quality of the underlying dataset. If, for example, we exclude disengaged populations, as some common methods do, we risk getting things wrong, whether understanding a market’s potential, predicting an election result, or anticipating the trajectory of the pandemic, the war, or the economy.

Track: Business

Technical Level: 1

Location: Toronto

Talk: The Risks of Excluding the Disengaged From your Dataset

Presenter:
Danielle Goldfarb, Vice President and General Manager of Global Affairs, Economics and Public Policy, RIWI (Real-Time Interactive Worldwide Intelligence)

About the Speaker:
Opening speaker, Tedx Toronto: The Smartest Way to Predict the Future; Developing a U of Toronto course on new data tools for global affairs / public policy; see also linkedin

Which talk track does this best fit into?
Business Strategy

Technical level of your talk?
(Technical level: 1/7)

What you’ll learn:
There is a huge emphasis on tools to analyze big data but we need to spend at least as much time on the quality of the underlying dataset. If, for example, we exclude disengaged populations, as some common methods do, we risk getting things wrong, whether understanding a market’s potential, predicting an election result, or anticipating the trajectory of the pandemic, the war, or the economy.

Abstract of Talk:
Analysts use machine learning tools to analyze the vast amounts of data now available. Big data is appealing, but even the best tools and techniques will give you noise if the underlying data aren’t robust and inclusive. To reliably predict elections, correctly anticipate consumer demand or to accurately predict the trajectory of a pandemic, leaders need to ask themselves who is (and isn’t) included in their dataset.

Danielle Goldfarb

Vice President and General Manager of Global Affairs, Economics and Public Policy, RIWI (Real-Time Interactive Worldwide Intelligence)

Opening speaker, Tedx Toronto: The Smartest Way to Predict the Future; Developing a U of Toronto course on new data tools for global affairs / public policy; see also linkedin

Talk: The Risks of Excluding the Disengaged From your Dataset

Abstract: Analysts use machine learning tools to analyze the vast amounts of data now available. Big data is appealing, but even the best tools and techniques will give you noise if the underlying data aren’t robust and inclusive. To reliably predict elections, correctly anticipate consumer demand or to accurately predict the trajectory of a pandemic, leaders need to ask themselves who is (and isn’t) included in their dataset.

What You’ll Learn: There is a huge emphasis on tools to analyze big data but we need to spend at least as much time on the quality of the underlying dataset. If, for example, we exclude disengaged populations, as some common methods do, we risk getting things wrong, whether understanding a market’s potential, predicting an election result, or anticipating the trajectory of the pandemic, the war, or the economy.

Track: Business

Technical Level: 1

Location: Toronto

Talk: Managing AI and the Associated Risks in a Regulatory Environment

Presenters:
David Van Bruwaene, Founder and CEO, Fairly AI & Fion Lee-Madan, Co-Founder and COO, Fairly AI & Susie Lindsay, Counsel, Law Commission of Ontario

About the Speakers:
Fion has over 20+ years of experience in enterprise software as a Solutions Architect (ex-Sapient, ex-Intuit, ex-ATG – acquired by Oracle.) She double majored in Computer Science and Human Biology at the University of Toronto and has an MBA from Boston University. She is a technical committee member of the CIO Strategy Council of Canada. She is a champion of DE&I, and a major supporter for women in tech as both a mentor and coach. She guest lectures for AI Ethics at Lighthouse Labs, an education company with a goal to bring more diversity into the Data Science field.

David has developed a deep understanding of ethics and formal logic, model theory, and Natural Language Processing (NLP) throughout his career in business and academia. He taught Business Ethics at the University of Waterloo and graduate level Ethics at Cornell and is a sought-after speaker for AI Ethics, Compliance and Risk Management at conferences around the world. On an exchange scholarship from Cornell to Berkeley, he became fascinated with powerful Natural Language Processing. David applied this background to cyberbullying and related NLP in his first AI startup ViSR (acquired by SafetoNet) where he was the Head of Data Science, then became the CEO and Board Member. His unique perspective as a practicing Data Scientist to being the CEO and Board Member at the top made David conscious of tensions between technical vs business decisions and the impact on people’s lives in the resulting automation of decision-making.

Susie is Counsel at the Law Commission of Ontario where she leads numerous LCO projects including: AI and the Civil Justice System, Protection Orders, and the LCO’s joint initiative on AI and Human Rights with the Ontario Human Rights Commission and Canadian Human Rights Commission. Before joining the Commission Susie practised regulatory law at a large communications company, and civil defence litigation at a boutique litigation firm. Susie is a graduate of Queen’s Law School, a Fulbright Scholar, has a Master of Laws from Harvard Law School and was a fellow at the Berkman Klein Centre for Internet & Society at Harvard University.

Which talk track does this best fit into?
Business Strategy

Technical level of your talk?
(Technical level: 2/7)

What you’ll learn:
Latest expert views on managing AI risks in the fast changing regulatory environment

Abstract of Talk:
Artificial Intelligence (AI) is a tool that can be used both to manage risks in high-stake industries such as financial services as well as poses some risks by itself. This expert panel will discuss AI yesterday versus AI today, focusing on how AI has evolved and developed from simple automation to complex decisioning. Our experts will cover AI regulatory trends globally and in the US, best practices in risk and compliance management, and complementing technologies that can be utilized to counter new and emerging AI risks.

Panel: Bias and Relevance in Application of ML in Healthcare

Presenters:
Shazia Akbar, Lead Machine Learning Engineer, Altis Labs & Ali Madani, Director of Machine Learning, Cyclica & Santosh Hariharan, Principal Scientist, Pfizer & Shiva Amiri, VP, Head of AI and Data Intelligence, Pivotal Life Sciences & Javier Diaz-Mejia, Head of Data Science, Phenomic AI

About the Speakers:
Dr. Shazia Akbar is the lead machine learning engineer at Altis Labs, a Toronto-based startup which leverages deep learning technologies to gain prognostic insight from medical imaging data. Since joining Altis Labs, Shazia has designed and developed artificial intelligent systems which ingest millions of imaging data to predict patient outcomes. Some of the applications she has developed to date include a fully automated model to quantify mortality risk in early stage lung cancer patients, and an x-ray model which determines in-patient admission risk of hospital patients diagnosed with community acquired pneumonia.

Shazia gained her PhD from the University of Dundee, UK, after which she joined the department of Radiology at New York University, US. In 2018, Shazia completed her postdoctoral fellowship at Sunnybrook Research Institute and the Vector Institute, designing novel deep learning algorithms for digital pathology. Her research interests include explainable AI, weakly supervised learning and applications of AI in healthcare.

Ali Madani leads ML technology development at Cyclica, a leading Canadian biotechnology company focused on AI based drug discovery. He is also editor of special Topic Artificial Intelligence in Cancer Diagnosis and Therapy at MDPI and work as an AI educator with companies like WeCloudData. Ali is a PhD graduate of University of Toronto; an alumni of University of Waterloo School of Engineering; and, attained a master of mathematics from the University of Waterloo. He is an active member of the machine learning community in Toronto and speaks in world and Canada wide conferences, webinars and workshops about technology development, machine learning, drug discovery and cancer therapeutics. He has also published more than 20 scientific articles in high impact factor journals on these subjects.

Santosh Hariharan is a Principal Scientist at Pfizer, committed to curing disease and improving patient lives. He enjoys solving complex biological problems using simple blocks with a motto of “”Seeing is Believing””. He develops and analyzes complex biology by looking at individual cells, evaluating their response to drugs/genetic perturbation and developing predictive models using AI and machine learning (Phenotyping).

Shiva Amiri is the VP, Head of AI and Data Intelligence at Pivotal Life Sciences, working at the intersection of computing and biology with experience in large-scale, multi-stakeholder technology development in data science and biology with a focus on computational biology, bioinformatics, machine learning, and big data systems. Shiva is a team builder and entrepreneurial in cutting edge computational methods in biology, digital health and medical research with a track record in big data/data science and program execution and strategy.

Javier is a data scientist with 15+ years of experience in projects aiming to solve problems of relevance for human health. He has experience in the academic, non-profit and industry sectors in Mexico, USA and Canada. Javier’s role involves identifying organization data science needs, building teams to implement solutions addressing those needs, and serving as a bridge between technical and executive stakeholders. Javier made his PhD studies in Mexico, postdoctoral training in Toronto and currently, works as Head of Data Science at Phenomic AI, a biotech startup developing machine learning tools to fight cancer.

Which talk track does this best fit into?
Business

Technical level of your talk?
(Technical level: 4/7)

What you’ll learn:
Examples of application of ML in healthcare; Impact of ML in healthcare technologies on patients; Potential biases in ML in Healthcare technologies

Abstract of Talk:
Machine Learning (ML) technologies learn how to accomplish tasks and identify patterns from available data. This data-based learning has been critical in helping the healthcare institute, biotechnology and pharmaceutical companies in developing new technologies to improve processes like disease diagnosis from radiological and pathological images and drug design. In this panel, we will discuss how companies design ML technologies that are not only in-line with their business model but eventually will impact patients and healthcare systems. We will also discuss the technological and sociological biases that need to be taken into account in the design of such technologies.

Talk: Data FAILS

Presenters:
Nicolás Venegas Oliva, Technical Lead of Advanced Analytics, LATAM Airlines & Sarah Sun, Director Data Science, Scotiabank

About the Speakers:
Nicolas Matias Venegas Oliva has 2 years of experience in backend development, 2+ years in data processing and the last 3+ years as Advanced Analytics technical leader at LATAM Airlines. During this time the team has grown from 9 to 48 highly trained professionals. It has also become the team with the highest impact generation within the company and a reference in the region in terms of MLOps and measured business impact through data products.

A decade in data has taken Sarah across multiple industries, including banking, technology, and natural resources. While specializing in data strategy, she was trained as a data scientist and has worked across the industry in innovation, governance, AI, and also a stint as CEO of a startup. Working in data has taught Sarah some value lessons – everything from seizing opportunities, the important of mental health, and the power of sharing stories. Sarah was named one of the Women Executive Network’s Top 100 Most Powerful Women in 2019.

Which talk track does this best fit into?
Business Strategy

Technical level of your talk?
(Technical level: 1/7)

What you’ll learn:
Lessons to be learnt in the experiences shared on what not to do. In addition, we like to talk about the wins, but the reality is fails are more frequent, but also more lessons to be learnt!

Abstract of Talk:
We like to talk about the successes…but why don’t we ever talk about the FAILS across the data world? Join us as we swap stories crossing data, industrial, and even geographical boundaries. We may have failed…but maybe amongst all the tales there’s a lesson or two to be learnt across networking, recruiting, planning, model building, engineering….you name it ;)

Muhammad Mamdan

Unity Health Toronto – VP: Data Science and Advanced Analytics; Director: Temerty Centre for Artificial Intelligence Research and Education in Medicine of the University of Toronto; Professor – University of Toronto

Dr. Mamdani is Vice President of Data Science and Advanced Analytics at Unity Health Toronto and Director of the University of Toronto Temerty Faculty of Medicine Centre for Artificial Intelligence Research and Education in Medicine (T-CAIREM). Dr. Mamdani’s team bridges advanced analytics including machine learning with clinical and management decision making to improve patient outcomes and hospital efficiency. Dr. Mamdani is also Professor in the Department of Medicine of the Temerty Faculty of Medicine, the Leslie Dan Faculty of Pharmacy, and the Institute of Health Policy, Management and Evaluation of the Dalla Lana Faculty of Public Health. He is also adjunct Senior Scientist at the Institute for Clinical Evaluative Sciences (ICES) and a Faculty Affiliate of the Vector Institute. In 2010, Dr. Mamdani was named among Canada’s Top 40 under 40. He has published over 500 studies in peer-reviewed medical journals. Dr. Mamdani obtained a Doctor of Pharmacy degree (PharmD) from the University of Michigan (Ann Arbor) and completed a fellowship in pharmacoeconomics and outcomes research at the Detroit Medical Center. During his fellowship, Dr. Mamdani obtained a Master of Arts degree in Economics from Wayne State University in Detroit, Michigan with a concentration in econometric theory. He then completed a Master of Public Health degree from Harvard University in 1998 with a concentration in quantitative methods.

Talk: Saving Lives with ML: Applications and Learnings

Abstract: Machine learning (ML) has transformed numerous industries but its application in healthcare has been limited. ML applications are expected to permeate healthcare in the near future with a recent explosion in academic and commercial activity. The application of ML in healthcare, however, is complicated by a variety of factors including the significant variability in needs, healthcare settings and patients served in these settings, workflows, and available resources. This talk will present a case study of Unity Health Toronto and its journey in developing and deploying numerous ML solutions into clinical practice, including bridging public and private sector partnerships to spread innovations internationally. The talk will also present a novel Canadian academic centre dedicated to artificial intelligence (AI) in medicine – the Temerty Centre for Artificial Intelligence Research and Education in Medicine (T-CAIREM) at the University of Toronto.

What You’ll Learn: The successful application of ML in healthcare is multifaceted and highly dependent on end-user engagement.
Innovative public-private partnerships are needed to spread ML applications globally.

Multidisciplinary, collaborative efforts will fuel innovations in the development and application of ML in healthcare.

Track: Case Study

Technical Level: 3

Location: Toronto

Talk: Saving Lives with ML: Applications and Learnings

Presenter:
Muhammad Mamdani, Unity Health Toronto – VP: Data Science and Advanced Analytics; Director: Temerty Centre for Artificial Intelligence Research and Education in Medicine of the University of Toronto; Professor – University of Toronto

About the Speaker:
Dr. Mamdani is Vice President of Data Science and Advanced Analytics at Unity Health Toronto and Director of the University of Toronto Temerty Faculty of Medicine Centre for Artificial Intelligence Research and Education in Medicine (T-CAIREM). Dr. Mamdani’s team bridges advanced analytics including machine learning with clinical and management decision making to improve patient outcomes and hospital efficiency.
Dr. Mamdani is also Professor in the Department of Medicine of the Temerty Faculty of Medicine, the Leslie Dan Faculty of Pharmacy, and the Institute of Health Policy, Management and Evaluation of the Dalla Lana Faculty of Public Health. He is also adjunct Senior Scientist at the Institute for Clinical Evaluative Sciences (ICES) and a Faculty Affiliate of the Vector Institute. In 2010, Dr. Mamdani was named among Canada’s Top 40 under 40. He has published over 500 studies in peer-reviewed medical journals.
Dr. Mamdani obtained a Doctor of Pharmacy degree (PharmD) from the University of Michigan (Ann Arbor) and completed a fellowship in pharmacoeconomics and outcomes research at the Detroit Medical Center. During his fellowship, Dr. Mamdani obtained a Master of Arts degree in Economics from Wayne State University in Detroit, Michigan with a concentration in econometric theory. He then completed a Master of Public Health degree from Harvard University in 1998 with a concentration in quantitative methods.

Which talk track does this best fit into?
Case Study

Technical level of your talk?
(Technical level: 3/7)

Are there any industries (in particular) that are relevant for this talk?
Hospital & Health Care

Who is this presentation for?
The successful application of ML in healthcare is multifaceted and highly dependent on end-user engagement.
Innovative public-private partnerships are needed to spread ML applications globally.
Multidisciplinary, collaborative efforts will fuel innovations in the development and application of ML in healthcare.

Abstract of Talk:
Machine learning (ML) has transformed numerous industries but its application in healthcare has been limited. ML applications are expected to permeate healthcare in the near future with a recent explosion in academic and commercial activity. The application of ML in healthcare, however, is complicated by a variety of factors including the significant variability in needs, healthcare settings and patients served in these settings, workflows, and available resources. This talk will present a case study of Unity Health Toronto and its journey in developing and deploying numerous ML solutions into clinical practice, including bridging public and private sector partnerships to spread innovations internationally. The talk will also present a novel Canadian academic centre dedicated to artificial intelligence (AI) in medicine – the Temerty Centre for Artificial Intelligence Research and Education in Medicine (T-CAIREM) at the University of Toronto.

Nikita Medvedev

Director of Advanced Analytics, Coca Cola

Nikita has over 10 years of experience in the Retail and Consumer Packaged Goods industries, working for companies like Loblaw and Sears. He is also an alumnus of the Master of Management Analytics program from Queen’s University, and holds a Bachelor of Finance & Economics degree from University of Toronto

Co-Presenter: Winston Li

Talk: The Application of Mobile Location Data for Vending Machine Site Selection and Revenue Optimization.

Abstract: In this presentation, we present an innovative approach to utilizing mobility data to optimize the placement of vending machines in Canada. Coca-Cola has more than 10k vending machines in various locations and their ROI heavily depends on the amount of foot traffic next to them as well as who those people are. For this use case, we’ll be concentrating on using the super detailed mobility data to understand the difference between our best machines and worst at scale, and optimizing their location based on the mobility data to increase the ROI. In addition to the practical and business application, we’ll also be able to share the algorithms used and the tech stack with the audience.

What You’ll Learn: Mobility data as an alternative data source for consumer related analytics and its recency and granularity and really drive measurable business outcomes.

Track: Case Study

Technical Level: 4

Location: Toronto

Talk: The Application of Mobile Location Data for Vending Machine Site Selection and Revenue Optimization

Presenters:
Nikita Medvedev, Director of Advanced Analytics & Winston Li, Founder, Coca Cola & Arima

About the Speaker:
Winston is the founder of Arima, a Canadian based startup that provides consumer data to its users. Our flagship product, the Synthetic Society, is a privacy-by-design, individual level database that mirrors the real society. Built using trusted sources like census, market research, mobility and purchase patterns, it contains 10k+ attributes across North America and enables advanced modelling at the most granular level.

Prior to founding Arima, Winston was the Director of Data Science at PwC and Omnicom. Winston is also a part-time faculty member at Northeastern University Toronto and sits on the advisory board of the Master of Analytics program.

Nikita is the Director of Advanced Analytics at Coca-Cola Canada Bottling Limited. Together with his team he is transforming terabytes of business operations data into actionable insights to drive growth and innovate in the Consumer Packaged Goods industry. He loves finding novel solutions to old problems and is obsessed with driving real lasting change through better use of data.

Nikita has over 10 years of experience in the Retail and Consumer Packaged Goods industries, working for companies like Loblaw and Sears. He is also an alumnus of the Master of Management Analytics program from Queen’s University, and holds a Bachelor of Finance & Economics degree from University of Toronto.

Which talk track does this best fit into?
Case Study

Technical level of your talk?
(Technical level: 4/7)

Are there any industries (in particular) that are relevant for this talk?
Food & Beverages, Information Technology & Service, Marketing & Advertising

What are the main core message (learning) you want attendees to take away from this talk?
Mobility data as an alternative data source for consumer related analytics and its recency and granularity and really drive measurable business outcomes.

Abstract of Talk:
In this presentation, we present an innovative approach to utilizing mobility data to optimize the placement of vending machines in Canada. Coca-Cola has more than 10k vending machines in various locations and their ROI heavily depends on the amount of foot traffic next to them as well as who those people are. For this use case, we’ll be concentrating on using the super detailed mobility data to understand the difference between our best machines and worst at scale, and optimizing their location based on the mobility data to increase the ROI. In addition to the practical and business application, we’ll also be able to share the algorithms used and the tech stack with the audience.

Shiming Ren

Senior Engineering Manager – Safety, MLOps and Infrastructure, Amazon/Twitch

I worked as a Software Engineer Manager at Twitch about MLOps and Tooling in Safety team. Here is my linkedin. I spoke at Meta’s At Scale about Scaling ML Workflows for Real-Time Moderation Challenges at Twitch, I also spoke at TwitchCon about Integrating Data into Twitch at Scale. I worked in engineering leadership role for 5 years and our team made several company wide MLOps tooling such as orchstration and feature store.

Co-Presenter: Chen Liu

Talk: From Silo to Collaboration – Building Tooling to Support Distributed ML Teams at Twitch

Abstract: In this talk, we will cover Twitch’s current ML team structure and challenges of it. Then we dive deep into some solutions we have built to support ML development at Twitch, including what they are and how they will benefit the situation. We close with a discussion of Twitch’s distributed ML team style and how we collaborate using Conductor as an example.

ML has been playing a more and more important role in Twitch’s products (e.g. Recommendation, Safety). In order to allow products to iterate fast, we keep ML practitioners in the product teams and empower the teams to work independently. Undoubtedly, there are common challenges in ML development regardless of product areas. So we are striving to develop tooling and infrastructures for general ML development in order to reduce duplicate work across ML teams. We will dive into those efforts we made in this presentation. For example, Twitch machine learning feature store is developed to have a single control plane serving as feature registry but facilitates distributed feature ownership (e.g. storage, pipelines). Conductor, a in-house ML orchestration system, promotes best practices in pipeline management with templated process control flow and distributed infrastructure management. Meanwhile, we are promoting collaborative ML culture among Twitch engineering teams. It is similar to community owned open source projects where teams share the same interests and encourage cross team contribution and development.

What You’ll Learn: Twitch’s strategy of scaling our ML infra and MLOps tooling has never been discussed online. And we aim to help audience figure out the best strategy to utilize ML tooling for enhancing collaborations between ML teams and boost scientists self-service / efficiency. This is a good lesson if companies are seeking to start MLOps from stratch.

Track: Case Study

Technical Level: 4

Talk: From Silo to Collaboration - Building Tooling to Support Distributed ML Teams at Twitch

Presenters:
Shiming Ren, Sr. Engineering Manager – Safety, MLOps and Infrastructure & Chen Liu, Twitch Sr. Engineering Manager on Personalization and ML Infra, Amazon/Twitch

About the Speaker:
I Shiming worked as a Software Engineer Manager at Twitch about MLOps and Tooling in Safety team. Here is my linkedin. I spoke at Meta’s At Scale about Scaling ML Workflows for Real-Time Moderation Challenges at Twitch, I also spoke at TwitchCon about Integrating Data into Twitch at Scale. I worked in engineering leadership role for 5 years and our team made several company wide MLOps tooling such as orchstration and feature store.

Chen is currently supporting teams working on personalization and ML infrastructures at Twitch. He is passionate about building scalable ML products and democratizing ML in the organization.

Which talk track does this best fit into?
Technical / Research

Technical level of your talk?
(Technical level: 4/7)

Are there any industries (in particular) that are relevant for this talk?
Computer Software, Information Technology & Service

Who is this presentation for?
Senior Business Executives, Product Managers, Data Scientists/ ML Engineers and High-level Researchers, Data Scientists/ ML Engineers, ML Engineers

What you’ll learn:
Twitch’s strategy of scaling our ML infra and MLOps tooling has never been discussed online. And we aim to help audience figure out the best strategy to utilize ML tooling for enhancing collaborations between ML teams and boost scientists self-service / efficiency. This is a good lesson if companies are seeking to start MLOps from stratch.

On a scale of 1-10 how mature is this applied AI application you plan to discuss?
7/10

Pre-requisite Knowledge:
Feature store, Orchstration, Large Scale Data Handling

What kind of DevOps tools you plan to discuss? Open source?
N/A Our tools are all in house

What are some of the languages you plan to discuss?
Python, Golang

What are some of the infrastructures you plan to discuss?
Feature Store, ML Orchstration, Realtime Inference, Distributed ML team collaborations

What is unique about this speech, from other speeches given on the topic?
We aim to use examples how Twitch build in house feature store, realtime inference and orchstration system to demonstrate from technology perspective about MLOps collaborations in a company. This is more like a hybrid tech and management talk which will benefit both engineer and leadership groups.

Abstract of Talk:
[High level intro]
In this talk, we will cover Twitch’s current ML team structure and its challenges of it. Then we dive deep into some solutions we have built to support ML development at Twitch, including what they are and how they will benefit the situation. We close with a discussion of Twitch’s distributed ML team style and how we collaborate using Conductor as an example.

[Actual abstract]
ML has been playing a more and more important role in Twitch’s products (e.g. Recommendation, Safety). In order to allow products to iterate fast, we keep ML practitioners in the product teams and empower the teams to work independently. Undoubtedly, there are common challenges in ML development regardless of product areas. So we are striving to develop tooling and infrastructures for general ML development in order to reduce duplicate work across ML teams. We will dive into those efforts we made in this presentation. For example, Twitch machine learning feature store is developed to have a single control plane serving as feature registry but facilitates distributed feature ownership (e.g. storage, pipelines). Conductor, a in-house ML orchestration system, promotes best practices in pipeline management with templated process control flow and distributed infrastructure management. Meanwhile, we are promoting collaborative ML culture among Twitch engineering teams. It is similar to community-owned open source projects where teams share the same interests and encourage cross team contribution and development.

Can you suggest 2-3 topics for post-discussion?
Manage ML teams collaboration in a distributed manner; ML tooling development from 0 to 10; Implementation details for feature store and ML orchestration system.

Valerii Podymov

Lead Data Scientist, FreshBooks

Valerii joined FreshBooks a year ago to lead and grow a team of Data Scientists and Machine Learning Engineers. He has an experience in multiple industries ranging from Electronics to Clean Tech and has contributed to the development of innovative solutions for a variety of brands such as LG Electronics, Panasonic, Samsung, Toyota, Scotiabank, Cineplex. He has a University Degree in Telecom Engineering and PhD in Automated Control Systems. Author of 20 patented inventions in Signal Processing, Electronics and Computing.

Talk: Builidng a Fully Automated ML Platform Using Kubeflow and Declarative Approach to Development of End-to-End ML Pipelines

Abstract: Recent innovations in the ML ecosystem have seen the emergence of operationally-focused technology like declarative systems and data-centric AI. These techniques appear to be a radical change for AI practitioners, who can now more simply frame use cases and manage workflows. In this talk, we’ll take a look at the history of AI to see the progress that has been made and how we’ve arrived at where we are now. How are high-tech companies handling AI initiatives internally, and why aren’t we all copying them? Has MLOps been the promised solution to simplifying deployment and monitoring of production AI? How do we create a simpler paradigm for operationalizing AI? All these questions and more will be addressed.

What You’ll Learn: A journey to higher levels of MLOps maturity is unique for any company and has no recipes due to experimental nature of MLOps. Many insights and ideas in this area are the results of investments by big names (Google, Microsoft, Amazon) and knowledge sharing between smaller companies like us working on similar problems. We are grateful for this opportunity to contribute to the ecosystem so that others can learn from us.

Track: Case Study

Technical Level: 6

Location: Toronto

Talk: Building a Fully Automated ML Platform Using Kubeflow and Declarative Approach to Development of End-to-End ML Pipelines

Presenters:
Valerii Podymov, Lead Data Scientist, FreshBooks & Roshan Isaac, Machine Learning Engineer, FreshBooks & Vlad Ryzhkov, Senior Data Engineer, FreshBooks & Joey Zhou, Senior Data Engineer, FreshBooks

About the Speaker:
Valerii joined FreshBooks a year ago to lead and grow a team of Data Scientists and Machine Learning Engineers. He has an experience in multiple industries ranging from Electronics to Clean Tech and has contributed to the development of innovative solutions for a variety of brands such as LG Electronics, Panasonic, Samsung, Toyota, Scotiabank, Cineplex. He has a University Degree in Telecom Engineering and PhD in Automated Control Systems. Author of 20 patented inventions in Signal Processing, Electronics and Computing.

Roshan works as a Machine Learning Engineer at FreshBooks where he is building ML Platform on Vertex AI and bringing MLOps best practices to the organization. He was previously at the same role with Cineplex. He has a Bachelor Degree in Computer Science and Engineering and hold graduate certificates in AI & Project Management. Overall he has 8+ years of experience in Machine Learning, Data Analytics and CRM software working in different startups and companies in Canada and India. He published papers in IEEE conferences and was a speaker at Libre Software Meeting (LSM), France.

Vlad joined FreshBooks a year ago with extensive Data Engineering background and he works on building ML Platform bringing best practices in large-scale data processing to the company. He has a PhD in System Analysis, Management and Information Processing. Overall, his 15+ years of software development experience comprises such areas as financial systems, e-commerce, e-sport and airlines in Canada and overseas.

Joey joined FreshBooks three months ago and works on the continuous monitoring framework for the ML team. Before, he had an experience in the tech industry, ranging from social-dating to e-commerce, in multiple roles such as Data Scientist and Machine Learning Engineer. He built a recommender systems for one of the largest e-commerce platforms in China. With hands-on experience in building and productionizing ML models, he is ready to pursue his passion for MLOps at FreshBooks.

Which talk track does this best fit into?
Technical / Research

Technical level of your talk?
(Technical level: 6/7)

Are there any industries (in particular) that are relevant for this talk?
Banking & Financial Services, Computer Software

Who is this presentation for?
Data Scientists/ ML Engineers, ML Engineers

What you’ll learn:
How we tackled existing challenges with Kubeflow pipelines changing the imperative approach to the declarative.

What are the main core message (learning) you want attendees to take away from this talk?
A journey to higher levels of MLOps maturity is unique for any company and has no recipes due to experimental nature of MLOps. Many insights and ideas in this area are the results of investments by big names (Google, Microsoft, Amazon) and knowledge sharing between smaller companies like us working on similar problems. We are grateful for this opportunity to contribute to the ecosystem so that others can learn from us.

On a scale of 1-10 how mature is this applied AI application you plan to discuss?
9/10

Pre-requisite Knowledge:
Machine Learning Lifecycle

What kind of DevOps tools you plan to discuss? Open source?
GitHub Actions, Kubeflow

What are some of the languages you plan to discuss?
Python, SQL

What are some of the infrastructures you plan to discuss?
BigQuery, Airflow, Vertex AI, containers

What is unique about this speech, from other speeches given on the topic?
Managing MLOps is highly immature topic with lack or absence of commonly accepted best practice, so the experience of any company in growing over MLOps maturity levels is always unique.

Abstract of Talk:
This talk is about our journey at FreshBooks from mostly manual processes in productionizing of our ML models to the highest levels of maturity in MLOps. First, we briefly go over a list of challenges we faced when working on the ML platform as a hybrid team of Data Scientists, ML Engineers and Data Ops Engineers. And then we provide more detailed overview of our end-to-end Kubeflow pipelines and a declarative MLOps framework that has been designed to speed up, simplify and improve the reliability of ML pipelines at each stage from development to production. Lessons learned and what’s next will be provided at the end of the talk.

Can you suggest 2-3 topics for post-discussion?
ML Ops, ML Model Governance

Eric Hart

Staff Data Scientist, Anheuser-Busch

Eric is a Staff Data Scientist with more than 7 years of experience working at Altair Engineering and Anheuser-Busch. He has a PhD in probability from the University of Toronto, and a masters degree in Applied Math and an undergraduate degree in Engineering from Queen’s university. He’s also a world champion Blokus player.

Talk: Optimal Beer Pricing: An Optimization Layer for Price Elasticities

Abstract: At Anheuser-Busch, we’re obsessed with price elasticities. When the price of beer changes, how will that affect the volume of beer that we sell? These questions (yes, this is more than one question) have implications all over the business, from price setting to procurement to financial planning. We’ve worked hard to make sure our answers to these questions are as data driven as possible. But once we have a model to produce (and predict) these elasticities, how do we make business decisions based on that? And how do we make sure those business decisions are also as data driven as possible?

In this talk we’ll discuss an optimal pricing layer for beer elasticities. We’ll cover how to use mathematical optimization to make specific price change suggestions at a variety of granularities to help achieve specific business objectives. We’ll consider what objective we actually want to optimize (Profit? Revenue? Market Share?) and see how to use constraints to help smooth the trade-off between these objectives. Finally, we’ll investigate how to ensure our price suggestions stay within the regions where the underlying elasticities models make sense.

Ever wanted to see a real-world example of levelling up your analytics from predictive- to prescriptive-, and do so in the context of price setting (or beer drinking)? Now’s your chance!

What You’ll Learn: How to add an optimization layer to ml models.

Track: Case Study

Technical Level: 2

Location: Toronto

Talk: Optimal Beer Pricing: An Optimization Layer for Price Elasticities

Presenter:
Eric Hart, Staff Data Scientist at Anheuser-Busch

About the Speaker:
Eric is a Staff Data Scientist with more than 7 years of experience working at Altair Engineering and Anheuser-Busch. He has a PhD in probability from the University of Toronto, and a masters degree in Applied Math and an undergraduate degree in Engineering from Queen’s university. He’s also a world champion Blokus player.

Which talk track does this best fit into?
Case Study

Technical level of your talk?
(Technical level: 2 /7)

Are there any industries (in particular) that are relevant for this talk?
Banking & Financial Services, Food & Beverages, Marketing & Advertising

Who is this presentation for?
Senior Business Executives, Product Managers, Data Scientists/ ML Engineers and High-level Researchers

What you’ll learn:
Putting a mathematical optimization layer on top of predictive models is still a mostly unused tool in the ML space. It’s very difficult to learn about that from existing resources.

What are the main core message (learning) you want attendees to take away from this talk?
How to add an optimization layer to ml models.

Pre-requisite Knowledge:
Not a lot. We’ll briefly discuss what price-elasticities and mathematical optimization are, but having heard those terms before (with a basic understanding) would help.

What is unique about this speech, from other speeches given on the topic?
I would argue the whole topic is fairly unique (optimization layers for predictive models are not widely used or discussed). In addition, the specifics of trying to work around the realities of the beer industry (especially varying laws about beer pricing across different geographies) add an extra layer of complexity to this already deep problem.

Abstract of Talk:
At Anheuser-Busch, we’re obsessed with price elasticities. When the price of beer changes, how will that affect the volume of beer that we sell? These questions (yes, this is more than one question) have implications all over the business, from price setting to procurement to financial planning. We’ve worked hard to make sure our answers to these questions are as data driven as possible. But once we have a model to produce (and predict) these elasticities, how do we make business decisions based on that? And how do we make sure those business decisions are also as data driven as possible?

In this talk we’ll discuss an optimal pricing layer for beer elasticities. We’ll cover how to use mathematical optimization to make specific price change suggestions at a variety of granularities to help achieve specific business objectives. We’ll consider what objective we actually want to optimize (Profit? Revenue? Market Share?) and see how to use constraints to help smooth the trade-off between these objectives. Finally, we’ll investigate how to ensure our price suggestions stay within the regions where the underlying elasticities models make sense.

Ever wanted to see a real-world example of levelling up your analytics from predictive- to prescriptive-, and do so in the context of price setting (or beer drinking)? Now’s your chance!

Can you suggest 2-3 topics for post-discussion?
Optimization Layers. Price Elasticities.

Jawad Ahmed

Staff Applied Scientist, Loblaw Digital

Jawad currently works as a Staff Applied Scientist at Loblaw Digital, supporting ML teams building personalization and recommender systems for different lines of business of Loblaw Companies.
He has 8 years of industry experience in Applied AI/ML. Previously, he worked at Flipp, Dialpad and McKinsey Solutions.

His areas of interest are using ML research applications to help build products with scalable ML solutions in NLP, Conversational AI, Computer Vision and Recommender Systems. Read more on linkedin

Talk: Solving Product Substitutions, The #1 Problem in Grocery E-Commerce – Through Self-Supervised ML

Abstract:
Background: Loblaw Companies Ltd is the largest grocery retailer in Canada. It operates multiple popular banners with Real Canadian Superstore, No Frills, and T&T being some of the most popular ones. E-commerce of grocery has become a significant part of the business accounting for more than $2 billion in sales per year.

Problem: Shopping for groceries online is an inherently different process than shopping in person. We take for granted the in-store shopper’s ability to make quick decisions on the fly when faced with the issue of product availability.

We fulfill from stores to ensure freshness which has a very dynamic inventory. This makes promises of items collected, sometimes a day or two after the order depending on the customer’s delivery date, affected by many factors – some of which we cannot control. Thus, we need a solution to substitute items that are out of stock at the time of picking to make sure the customer experience is minimally impacted. While shopping at a physical store, a customer can make a suitable choice of an alternative. In the e-commerce process of grocery shopping, either the customer has to make a selection of the substitute, or the Loblaw employee picking the order on behalf of the customer needs a relevant suggestion on the best substitute for the given item, personalized for the given customer.

Loblaw has historical data available on what selection was made by customers from the list of various possible substitute options available for a given item. Additionally, there is data available on the choices made by pickers – the employees who shop at the store to fulfill customers’ orders. This provides us an opportunity to tailor product similarities toward product substitutions that are tied to business metrics.

Solution: We explored multiple solutions to solve this problem. Our most promising solution that we wish to present leverages features extracted from text descriptions and images of products. In this talk, we will discuss how our approach evolved over time and how this cutting-edge self-supervised method is a big improvement over the traditional techniques.

What You’ll Learn: The talk covers the data curation process by which we prepared a benchmarking Products Substitutions dataset using historical human-selected substitutions data at Loblaw.

The audience will learn about self-supervised ML approaches we used to recommend product substitutions benchmarked against the above mentioned products substitutions Testset.

Track: Case Study

Technical Level: 5

Location: Waterloo, ON

Talk: Solving Product Substitutions, A Big Problem in Grocery E-Commerce – Through Self-Supervised ML

Presenter:
Jawad Ahmed, Staff Applied Scientist, Loblaw Digital

About the Speaker:
Jawad currently works as a Staff Applied Scientist at Loblaw Digital, supporting ML teams building personalization and recommender systems for different lines of business of Loblaw Companies.
He has 8 years of industry experience in Applied AI/ML. Previously, he worked at Flipp, Dialpad and McKinsey Solutions.
His areas of interest are using ML research applications to help build products with scalable ML solutions in NLP, Conversational AI, Computer Vision and Recommender Systems. Read more on linkedin.

Which talk track does this best fit into?
Case Study

Technical level of your talk?
(Technical level: 5/7)

What you’ll learn:
The talk covers the data curation process by which we prepared a benchmarking Products Substitutions dataset using historical human-selected substitutions data at Loblaw.

The audience will learn about self-supervised ML approaches we used to recommend product substitutions benchmarked against the above mentioned products substitutions Testset.

Abstract of Talk:
Background: Loblaw Companies Ltd is the largest grocery retailer in Canada. It operates multiple popular banners with Real Canadian Superstore, No Frills, and T&T being some of the most popular ones. E-commerce of grocery has become a significant part of the business accounting for more than $2 billion in sales per year.

Problem: Shopping for groceries online is an inherently different process than shopping in person. We take for granted the in-store shopper’s ability to make quick decisions on the fly when faced with the issue of product availability.

We fulfill from stores to ensure freshness which has a very dynamic inventory. This makes promises of items collected, sometimes a day or two after the order depending on the customer’s delivery date, affected by many factors – some of which we cannot control. Thus, we need a solution to substitute items that are out of stock at the time of picking to make sure the customer experience is minimally impacted. While shopping at a physical store, a customer can make a suitable choice of an alternative. In the e-commerce process of grocery shopping, either the customer has to make a selection of the substitute, or the Loblaw employee picking the order on behalf of the customer needs a relevant suggestion on the best substitute for the given item, personalized for the given customer.

Loblaw has historical data available on what selection was made by customers from the list of various possible substitute options available for a given item. Additionally, there is data available on the choices made by pickers – the employees who shop at the store to fulfill customers’ orders. This provides us an opportunity to tailor product similarities toward product substitutions that are tied to business metrics.

Solution: We explored multiple solutions to solve this problem. Our most promising solution that we wish to present leverages features extracted from text descriptions and images of products. In this talk, we will discuss how our approach evolved over time and how this cutting-edge self-supervised method is a big improvement over the traditional techniques.

Quoc Tien Au

Data Scientist, Manifest Climate

I am a data scientist at Manifest Climate, working on applying machine learning and natural language processing to climate disclosures. Extracting information at scale is paramount to increase transparency in financial markets, so that we can improve decision-making with data-driven climate information.

Talk: Assessing Alignment of Climate Disclosures Using NLP for the Financial Markets

Abstract: Climate-related disclosure is increasing in importance as companies and stakeholders alike aim to reduce their environmental impact and exposure to climate-induced risk. Companies primarily disclose this information in annual or other lengthy documents where climate information is not the sole focus. To assess the quality of a company’s climate-related disclosure, these documents, often hundreds of pages long, must be reviewed manually by climate experts. We propose a more efficient approach to assessing climate-related financial information. We construct a model leveraging TF-IDF, sentence transformers and multi-label k nearest neighbors (kNN). The developed model is capable of assessing alignment of climate disclosures at scale, with a level of granularity and transparency that will support decision-making in the financial markets with relevant climate information.

What You’ll Learn: How an early-stage startup runs machine learning experiments ; takes decisions balancing model performance, model explainability, resource constraints and added business value ; uses deep language models to create the most valuable business opportunities.

Track: Case Study

Technical Level: 5

Location: Toronto

Talk: Assessing Alignment of Climate Disclosures Using NLP for the Financial Markets

Presenters:
Quoc Tien Au, Data Scientist, Manifest Climate & Aysha Cotterill, Data Analyst, Manifest Climate

About the Speakers:
I am a data scientist at Manifest Climate, working on applying machine learning and natural language processing to climate disclosures. Extracting information at scale is paramount to increase transparency in financial markets, so that we can improve decision-making with data-driven climate information.

Which talk track does this best fit into?
Case Study

Technical level of your talk?
(Technical level: 5/7)

What you’ll learn:
How an early-stage startup runs machine learning experiments ; takes decisions balancing model performance, model explainability, resource constraints and added business value ; uses deep language models to create the most valuable business opportunities.

Abstract of Talk:
Climate-related disclosure is increasing in importance as companies and stakeholders alike aim to reduce their environmental impact and exposure to climate-induced risk. Companies primarily disclose this information in annual or other lengthy documents where climate information is not the sole focus. To assess the quality of a company’s climate-related disclosure, these documents, often hundreds of pages long, must be reviewed manually by climate experts. We propose a more efficient approach to assessing climate-related financial information. We construct a model leveraging TF-IDF, sentence transformers and multi-label k nearest neighbors (kNN). The developed model is capable of assessing alignment of climate disclosures at scale, with a level of granularity and transparency that will support decision-making in the financial markets with relevant climate information.

Talk: Marketplace Experimentation at SpotHero

Presenter:
Amish Popli, Data Scientist, SpotHero

About the Speaker:
Amish Popli is passionate about solving challenging business problems using data science and machine learning. He supports multiple departments at SpotHero including, but not limited to, marketing, sales, and product development. He likes data, manipulating it, making it (simulation), modelling it, visualizing it, and yes, even cleaning it. He works with different PMs and engineers in different domains and has brought many successful products from discovery to production.

Which talk track does this best fit into?
Case Study

Technical level of your talk?
(Technical level: 4/7)

Are there any industries (in particular) that are relevant for this talk?
Parking

Who is this presentation for?
Product Managers, Data Scientists/ ML Engineers

What you’ll learn:
Experimentation is a very nebulous topic. There is a lot of companies, articles and research available on google, but each company has its own unique way of running and measuring impact from experiments.

Pre-requisite Knowledge:
Knowledge of basic statistical tests

What is unique about this speech, from other speeches given on the topic?
In my knowledge there is no company in North America doing data science/ML in parking industry. The problems we are solving are present in other industries but parking adds another layer of complexity on top of it.

Abstract of Talk:
SpotHero is the biggest and fastest growing off-street reservation platform in North America. It is a two sided marketplace involving drivers and parking garage owners. The data science team at SpotHero is working on many interesting problems in the areas of dynamic pricing, marketing, ranking etc. One of the key challenges that we face is how we test our machine learning models in production and make sure that the changes we make lead to an improvement in our KPI’s. In this talk, I will focus on how SpotHero runs experiments whenever we make improvements or create a new model to generate prices for our parking spots. I will cover why the general A/B test framework will not work in our scenario, various approaches that we considered and introduce switchback experimentation as an alternative. I will discuss our experiment design and conclude the talk with a result from one of our experiments and our technical architecture.

Can you suggest 2-3 topics for post-discussion?
A/B Tests, Switchback experiments, challenges in running live expeirments

Hien Luu

Head of Machine Learning Platform, DoorDash

Hien Luu is a Sr. Engineering Manager at DoorDash, leading the Machine Learning Platform team. He is particularly passionate about the intersection between Big Data and Artificial Intelligence. He is the author of the Beginning Apache Spark 3 book. He has given presentations at various conferences such as GHC 2022, Data+AI Summit, XAI 21 Summit, MLOps World, YOW Data!, appy(), QCon (SF,NY, London).

Talk: Scaling & Evolving the Machine Learning Platform at DoorDash

Abstract: As DoorDash business grows, the online ML prediction volume grows exponentially to support the various Machine Learning use cases, such as the ETA predictions, the Dasher assignments, the personalized restaurants and menu items recommendations, and the ranking of the large volume of search queries.

In this session, we will share our journey of building and scaling our Machine Learning platform and particularly the prediction service, the various optimizations experimented, lessons learned, technical decisions and tradeoffs made. We will also share how we measure success and how we set goals for the future.

What You’ll Learn: The challenges and learning lessons from building an ML platform to support ML at scale

Track: Case Study

Technical Level: 5

Location: San Jose, CA

Talk: Scaling & Evolving the Machine Learning Platform at DoorDash.

Presenter:
Hien Luu, Head of Machine Learning Platform, DoorDash

About the Speaker:
Hien Luu is a Sr. Engineering Manager at DoorDash, leading the Machine Learning Platform team. He is particularly passionate about the intersection between Big Data and Artificial Intelligence. He is the author of the Beginning Apache Spark 3 book. He has given presentations at various conferences such as GHC 2022, Data+AI Summit, XAI 21 Summit, MLOps World, YOW Data!, appy(), QCon (SF,NY, London).

Which talk track does this best fit into?
Case Study

Technical level of your talk?
(Technical level: 5/7)

Who is this presentation for?
Senior Business Executives, Product Managers, Data Scientists/ ML Engineers and High-level Researchers, Data Scientists/ ML Engineers, ML Engineers

What you’ll learn:
The journey in scaling the ML platform

On a scale of 1-10 how mature is this applied AI application you plan to discuss?
10/10

Pre-requisite Knowledge:
High level understanding of microservices

What kind of DevOps tools you plan to discuss? Open source?
CI/CD, Git, MLFlow,

What are some of the languages you plan to discuss?
Python, Kotlin

What are some of the infrastructures you plan to discuss?
Feature engineering at scale, low latency and high QPS model prediction service

What is unique about this speech, from other speeches given on the topic?
This is a case study about our journey of building ML platform at DoorDash

Abstract of Talk:
As DoorDash business grows, the online ML prediction volume grows exponentially to support the various Machine Learning use cases, such as the ETA predictions, the Dasher assignments, the personalized restaurants and menu items recommendations, and the ranking of the large volume of search queries.

In this session, we will share our journey of building and scaling our Machine Learning platform and particularly the prediction service, the various optimizations experimented, lessons learned, technical decisions and tradeoffs made. We will also share how we measure success and how we set goals for the future.

Can you suggest 2-3 topics for post-discussion?
Adopting MLOps

Talk: Sensitivity and Interpretability of AI-Models

Presenters:
Hanieh Arjmand, ML Researcher, Lydia.ai & Spark Tseung, Applied Data Scientist, Lydia.ai

About the Speakers:
Hanieh Arjmand is a Machine Learning Researcher at Lydia.ai where she focuses on discovering and applying the best machine learning techniques to healthcare and insurance problems to help insurers use machine learning to protect more people.

Spark Tseung is an Applied Data Scientist at Knowtions Research where he focuses on building frameworks for actuarial and underwriting validation to help insurers use machine learning to protect more people. Spark is working towards his Ph.D. in Statistics and specializes in the application of machine learning methods in Property & Casualty loss modelling and risk selection.

Which talk track does this best fit into?
Case Study

Technical level of your talk?
(Technical level: 4/7)

What you’ll learn:
Using case studies from our work, we will discuss potential approaches for designing some of the sensitivity tests, which have helped us understand different aspects of model behaviours and data biases.

Abstract of Talk:
Model interpretability is important especially in regulated industries where risk-sensitive decisions typically require transparency and reliability of the underlying model. While often model interpretability gets sacrificed in other fields in order to achieve superior predictive performance, this is not the case in the regulated industries such as healthcare where model fairness plays an important role. In this talk, we will present case studies to illustrate the importance of sensitivity analysis for model interpretability and to showcase our design and implementations. Depending on the use cases of machine learning models, sensitivity tests have to be specifically and carefully designed and implemented. Using our machine learning models on electronic health record (EHR) and human activity, we will discuss potential approaches for designing some of the sensitivity tests, which have helped us understand different aspects of model behaviour and even uncover the unwanted biases and behaviours that had to be eliminated.

Talk: Transforming The Retail Industry with Transformers

Presenters:
Kyryl Truskovskyi, Applied Research Scientist, Georgian & Rohit Saha, Applied Research Scientist, Georgian

About the Speakers:
Kyryl has over eight years of experience in the field of Machine Learning. For the bulk of this career, he has helped build machine learning startups, from inception to a product. He has also developed expertise in choosing and implementing state-of-the-art deep learning architectures and large-scale solutions based on them.

Rohit Saha is currently an applied research scientist at Georgian’s R&D team and is assisting portfolio companies with their research endeavours. Owing to previous roles, he has experience building end-to-end machine learning pipelines. He holds a master’s degree from the University of Toronto, and his research interests include generative modelling and transfer learning for Computer Vision tasks

Which talk track does this best fit into?
Case Study

Technical level of your talk?
(Technical level: 6/7)

What you’ll learn:
The insights and findings we share in this talk are derived from using the latest ML techniques and tools for solving real-world use cases at SPINS and are not readily available on the internet. We also open the stage for Q&A at the end of this talk to address the questions from the audience.

Abstract of Talk:
In recent years, we have seen amazing results in artificial intelligence and machine learning owing to the emergence of models such as transformers and pretrained language models. Despite the astounding results published in academic papers, there remains a lot of ambiguity and challenges when it comes to deploying these models in the industry because 1) troubleshooting, training, and maintaining these models is very time and cost consuming due to their inherent large sizes and complexities 2) there is not yet enough clarity about when the advantages and challenges of these models outweigh classical ML models. These challenges are even more severe for small and mid-sized companies that do not have access to huge compute resources and infrastructure. In this talk, we discuss these challenges and share our findings and recommendations from working on real-world examples at SPINS, a data/tech company focused on the natural grocery industry. More specifically, we describe how we leverage state-of-the-art language models to seamlessly automate parts of SPINS’ data ingestion workflow and drive substantial business outcomes. We provide a walk-through of our end-to-end MLOps system and discuss how using the right tools and methods has helped to mitigate some of these challenges. We also share our findings from our experimentation and provide insights on when one should use these massive transformer models instead of classical ML models. Considering that we have a variety of challenges in our use cases from an ill-defined label space to a huge number of classes (~86,000) and massive data imbalance, we believe our findings and recommendations can be applied to most real-world settings. We hope that the learnings from this talk can help you to solve your own problems more effectively and efficiently!

Talk: Scaling Advanced Analytics in the Worst Crisis in the Industry Area

Presenters:
Nicolas Venegas Oliva, Technical Lead of Advanced Analytics, LATAM Airlines & Cristóbal Guzmán Wilkendorf, Staff Data Scientist, LATAM Airlines

About the Speakers:
2 years of experience in backend development, 2+ years in data processing and the last 3+ years as Advanced Analytics technical leader at LATAM Airlines. During this time the team has grown from 9 to 48 highly trained professionals. It has also become the team with the highest impact generation within the company and a reference in the region in terms of MLOps and measured business impact through data products.

Which talk track does this best fit into?
Case Study

Technical level of your talk?
(Technical level: 5/7)

What you’ll learn:
Scaling of MLOps teams, impact measurement, selection and training of highly technical teams.

Abstract of Talk:
For data science teams looking to create real business value with AI – MLOps is not something that’s ‘nice to have’ – It’s a MUST HAVE. To make MLOps work for your organization, you need to have the right tools combined with the right skillset across the different roles, and a unified process. For LATAM Airlines Group, being faced with the worst airline industry crisis following the COVID-19 pandemic, MLOps was imperative. We set off to create a cross-company MLOps strategy and implement it across dozens of use cases. In this talk, we will share our MLOps strategy, provide tips for success, pitfalls to avoid based on our own data science journey and dive into two of our use cases.

Talk: The Role of Alternative Data in Investing

Presenter:
Serena McDonnell, Lead Data Scientist, Delphia

About the Speaker:
Serena is a Lead Data Scientist and quant researcher at Delphia, where she uses machine learning to power the fund’s long-short equity market neutral strategy. Passionate about knowledge sharing and continuous learning, Serena co-hosts Deep Random Talks, a podcast which focusses on machine learning, product development, and knowledge management. She is an organizer of AI Socratic Circles (AISC), a highly technical machine learning reading group for industry professionals. As part of AISC, Serena leads a research group that focusses on applying natural language processing and representation learning to recommender systems. Serena holds an M.Sc. in Mathematics from the Hong Kong University of Science and Technology, and a B.S.C. in Mathematics and Biology from McGill University.

Which talk track does this best fit into?
Case Study

Technical level of your talk?
(Technical level: 4/7)

What you’ll learn:
– Understand the advantages of alternative data in investing in general.
– Understand the promise of alternative data in quantitative equity strategies, and the challenges.
– Develop an opinion on the value of alternative data, when to invest in it, and when to consider sticking to more traditional data sources.

Abstract of Talk:
Applying alternative data to quantitative equity strategies has high potential and unique challenges. In this talk, we will use Delphia’s machine learning driven long-short equity market neutral strategy as context to discuss the following:
– Case studies to highlight the advantages of alternative data in investing in general.
– The promise of alternative data in quantitative equity strategies.
– The challenges in working with alternative data in Delphia’s strategy

Chloe Pou-Prom

Data Scientists, Unity Health Toronto

Chloé Pou-Prom is a data scientist with the Data Science and Advanced Analytics (DSAA) team at Unity Health Toronto. The DSAA team uses high quality healthcare data in innovative ways to catalyze communities of data users and decision makers in making transformative changes that improve patient outcomes and healthcare system efficiency.

Co-Presenter: Vaakesan Sundrelingam

Workshop: NLP for Healthcare: Challenges With Processing and De-Identifying Clinical Notes

Abstract: Clinical notes (e.g., admission notes, nurse notes, radiology reports) are rich with information. In this session, we discuss the challenges of working with text data from two different perspectives. First, we provide an overview of the different issues that one can encounter when working with healthcare data, with an emphasis on data processing and cleaning. Then, we focus on the challenges that arise when it comes to sharing data across hospitals, more specifically de-identifying clinical text data. Finally, we provide a demo of pydeid, a Python-based de-identification software that identifies and replaces personal health information (PHI).

What You’ll Learn: 
1) Why NLP for healthcare is challenging;
2) Why sharing clinical notes across hospitals is difficult; and
3) Some tips and tools to help out with (1) and (2)

Technical Level: 3

Location: Toronto

Workshop: NLP for Healthcare: Challenges With Processing and De-Identifying Clinical Notes

Presenters:
Chloe Pou-Prom, Data Scientists, Unity Health Toronto & Vaakesan Sundrelingam, Data Scientists, Unity Health Toronto

About the Speakers:
Chloé Pou-Prom is a data scientist with the Data Science and Advanced Analytics (DSAA) team at Unity Health Toronto. The DSAA team uses high quality healthcare data in innovative ways to catalyze communities of data users and decision makers in making transformative changes that improve patient outcomes and healthcare system efficiency.

Vaakesan Sundrelingam is a data scientist with the GEMINI team at Unity Health Toronto. GEMINI is Canada’s largest hospital data & analytics study, helping physicians, health care teams, and hospitals use data to gain insights into patient care and improve patient outcomes. GEMINI uses machine learning in creative ways to prepare large amounts of data for researchers, as well as in clinical applications such as to detect particularly difficult to measure conditions for quality of care improvement initiatives.

Technical level of your talk?
(Technical Level: 3/7)

What you’ll learn:
1) Why NLP for healthcare is challenging;
2) Why sharing clinical notes across hospitals is difficult; and
3) Some tips and tools to help out with (1) and (2)

Abstract of Talk:
Clinical notes (e.g., admission notes, nurse notes, radiology reports) are rich with information. In this session, we discuss the challenges of working with text data from two different perspectives. First, we provide an overview of the different issues that one can encounter when working with healthcare data, with an emphasis on data processing and cleaning. Then, we focus on the challenges that arise when it comes to sharing data across hospitals, more specifically de-identifying clinical text data. Finally, we provide a demo of pydeid, a Python-based de-identification software that identifies and replaces personal health information (PHI).

Stefanie Molin

Software Engineer / Data Scientist, Bloomberg

Stefanie Molin is a software engineer and data scientist at Bloomberg in New York City, where she tackles tough problems in information security, particularly those revolving around data wrangling/visualization, building tools for gathering data, and knowledge sharing. She is also the author of “Hands-On Data Analysis with Pandas,” which is currently in its second edition. She holds a bachelor’s of science degree in operations research from Columbia University’s Fu Foundation School of Engineering and Applied Science. She is currently pursuing a master’s degree in computer science, with a specialization in machine learning, from Georgia Tech. In her free time, she enjoys traveling the world, inventing new recipes, and learning new languages spoken among both people and computers.

Workshop: Beyond the Basics: Data Visualization in Python

Abstract: The human brain excels at finding patterns in visual representations, which is why data visualizations are essential to any analysis. Done right, they bridge the gap between those analyzing the data and those consuming the analysis. However, learning to create impactful, aesthetically-pleasing visualizations can often be challenging. This session will equip you with the skills to make customized visualizations for your data using Python.

While there are many plotting libraries to choose from, the prolific Matplotlib library is always a great place to start. Since various Python data science libraries utilize Matplotlib under the hood, familiarity with Matplotlib itself gives you the flexibility to fine tune the resulting visualizations (e.g., add annotations, animate, etc.). This session will also introduce interactive visualizations using HoloViz, which provides a higher-level plotting API capable of using Matplotlib and Bokeh (a Python library for generating interactive, JavaScript-powered visualizations) under the hood.

What You’ll Learn: Data visualization is essential for anyone working with data, but sometimes it can be difficult to create impactful visualizations in Python. In this workshop, we will move beyond the plotting basics and explore how to make compelling static, animated, and interactive visualizations.

Technical Level: 4

Location: New York City

Workshop: Beyond the Basics: Data Visualization in Python

Presenter:
Stefanie Moliin, Software Engineer / Data Scientist, Bloomberg

About the Speaker:
Stefanie Molin is a software engineer and data scientist at Bloomberg in New York City, where she tackles tough problems in information security, particularly those revolving around data wrangling/visualization, building tools for gathering data, and knowledge sharing. She is also the author of “Hands-On Data Analysis with Pandas,” which is currently in its second edition. She holds a bachelor’s of science degree in operations research from Columbia University’s Fu Foundation School of Engineering and Applied Science. She is currently pursuing a master’s degree in computer science, with a specialization in machine learning, from Georgia Tech. In her free time, she enjoys traveling the world, inventing new recipes, and learning new languages spoken among both people and computers.

Which talk track does this best fit into?
Workshop (1.5-4 hours)

Technical level of your talk?
(Technical level: 4/7)

Who is this presentation for?
Data Scientists/ ML Engineers, ML Engineers, Researchers

What you’ll learn:
A workshop provides the attendees opportunities to ask questions to make sure they are understanding the concepts. Attendees will also have a workshop of curated examples using real-world data rather than the dummy or randomly-generated data nearly everywhere. Each of the visualizations is also created step-by-step, viewing how it changes with each command, which gives attendees a much stronger grasp of the concepts that they can apply elsewhere.

What are the main core message (learning) you want attendees to take away from this talk?
Data visualization is essential for anyone working with data, but sometimes it can be difficult to create impactful visualizations in Python. In this workshop, we will move beyond the plotting basics and explore how to make compelling static, animated, and interactive visualizations.

Pre-requisite Knowledge:
You should have basic knowledge of Python and be comfortable working in Jupyter Notebooks. Check out this notebook for a crash course in Python or work through the official Python tutorial for a more formal introduction. The environment we will use for this workshop comes with JupyterLab, which is pretty intuitive, but be sure to familiarize yourself using notebooks in JupyterLab and additional functionality in JupyterLab. In addition, a basic understanding of pandas will be beneficial, but is not required; reviewing the first section of my pandas workshop will be sufficient.

What is unique about this speech, from other speeches given on the topic?
My teaching style is very different: since the code examples I provide are carefully chosen, it’s easy to see why would take the approach I show, so I make sure that the attendees understand exactly what each line of code is doing to make that happen. I find that this gives the attendees knowledge that they can apply to other problems, rather than just knowing that the code all together has some effect — they get a deeper understanding and can use the concepts like building blocks for their own use cases. Attendees often praise the content in the slides as a detailed reference for later as well.

Abstract of Talk:
The human brain excels at finding patterns in visual representations, which is why data visualizations are essential to any analysis. Done right, they bridge the gap between those analyzing the data and those consuming the analysis. However, learning to create impactful, aesthetically-pleasing visualizations can often be challenging. This session will equip you with the skills to make customized visualizations for your data using Python.

While there are many plotting libraries to choose from, the prolific Matplotlib library is always a great place to start. Since various Python data science libraries utilize Matplotlib under the hood, familiarity with Matplotlib itself gives you the flexibility to fine tune the resulting visualizations (e.g., add annotations, animate, etc.). This session will also introduce interactive visualizations using HoloViz, which provides a higher-level plotting API capable of using Matplotlib and Bokeh (a Python library for generating interactive, JavaScript-powered visualizations) under the hood.

Can you suggest 2-3 topics for post-discussion?
Anything relating to the content covered, building data tools, or writing a book/creating workshops

Patricia Thaine

Co-Founder & CEO, Private AI

Patricia Thaine is the Co-Founder & CEO of Private AI, a Microsoft-backed startup, is also a Computer Science PhD Candidate at the University of Toronto (on leave) and a Vector Institute alumna. Her R&D work is focused on privacy-preserving natural language processing, with a focus on applied cryptography and re-identification risk. She also does research on computational methods for lost language decipherment. Patricia is a recipient of the NSERC Postgraduate Scholarship, the RBC Graduate Fellowship, the Beatrice “Trixie” Worsley Graduate Scholarship in Computer Science, and the Ontario Graduate Scholarship. She has ten years of research and software development experience, including at the McGill Language Development Lab, the University of Toronto’s Computational Linguistics Lab, the University of Toronto’s Department of Linguistics, and the Public Health Agency of Canada.

Workshop: Demystifying De-Identification

Abstract: Workshop with discussion and demo. The session will begin with an overview of privacy enhancing technologies and then dive into de-identification terminology (de-identification, anonymization, redaction, pseudonymization), how these have been misunderstood, and what to think about when choosing between one of these and other privacy enhancing technologies.
The attendees should bring a sample dataset (preferably made up of unstructured text) and a use case in mind. Each attendee will receive an API key to process a data sample and we will discuss the results. Data can be in languages other than English. Please confirm with organizer that the language is supported first.

What You’ll Learn: Attendees will learn about which privacy enhancing technologies are best for their use case and understand when de-identification is right for them and how not to misuse terminology such as “anonymization”

Technical Level: 4

Location: Toronto

Workshop: Demystifying De-Identification

Presenter:
Patricia Thaine, Co-Founder & CEO, Private AI

About the Speaker:
Patricia Thaine is the Co-Founder & CEO of Private AI, a Microsoft-backed startup, is also a Computer Science PhD Candidate at the University of Toronto (on leave) and a Vector Institute alumna. Her R&D work is focused on privacy-preserving natural language processing, with a focus on applied cryptography and re-identification risk. She also does research on computational methods for lost language decipherment. Patricia is a recipient of the NSERC Postgraduate Scholarship, the RBC Graduate Fellowship, the Beatrice “Trixie” Worsley Graduate Scholarship in Computer Science, and the Ontario Graduate Scholarship. She has ten years of research and software development experience, including at the McGill Language Development Lab, the University of Toronto’s Computational Linguistics Lab, the University of Toronto’s Department of Linguistics, and the Public Health Agency of Canada.

Technical level of your talk?
(Technical Level: 4/7)

What you’ll learn:
Attendees will learn about which privacy enhancing technologies are best for their use case and understand when de-identification is right for them and how not to misuse terminology such as “anonymization”

Abstract of Talk:
Workshop with discussion and demo. The session will begin with an overview of privacy enhancing technologies and then dive into de-identification terminology (de-identification, anonymization, redaction, pseudonymization), how these have been misunderstood, and what to think about when choosing between one of these and other privacy enhancing technologies.
The attendees should bring a sample dataset (preferably made up of unstructured text) and a use case in mind. Each attendee will receive an API key to process a data sample and we will discuss the results. Data can be in languages other than English. Please confirm with organizer that the language is supported first.

Denys Linkov

ML Lead, Voiceflow

Started the ML team at Voiceflow, Kickstarted Rbc’s MLOps journey, Youngest Senior Architect at Rbc. Lead discussion groups and mentorship on MLOps and various blog posts

Workshop: Iterating on NLP Models from R&D to Production

Abstract: Research papers, blogs and products are the culmination of many hours of work, iteration and frustration. However, in these final polished formats, we often gloss over the iteration or creative process on how to get to our desired results.

In this talk, I’ll cover a series of short labs that mirror some of the challenges we’ve faced in building out our NLP models and algorithms. It will be an interactive session with a series of collaborative problem solving, and explanations of what we built and the process we took along the way.

Some of the twists and turns will include:
– Integrating a BERT based model with a message queue system
– Speeding up semantic search through vectorization
– Enabling multi lingual recommendations

Each member of the talk will have access to the code examples and will be encouraged to think beyond the challenges addressed and how they can apply some of our lessons learned to their own work.

What You’ll Learn: 
– How to go from idea to product
– How to iterate on a product
– How to go to production
– How to incorporate customer feedback

Technical Level: 6

Location: Toronto

Workshop: Iterating on NLP models from R&D to Production

Presenter:
Denys Linkov, ML Lead, Voiceflow

About the Speaker:
Started the ML team at Voiceflow, Kickstarted Rbc’s MLOps journey, Youngest Senior Architect at Rbc. Lead discussion groups and mentorship on MLOps and various blog posts

Technical level of your talk?
(Technical level: 6/7)

Are there any industries (in particular) that are relevant for this talk?
Computer Software, Information Technology & Service, Any startup / large company looking at the R&D process

Who is this presentation for?
Data Scientists/ ML Engineers, ML Engineers, Researchers

What you’ll learn:
Specific examples and challenges of building NLP products

What are the main core message (learning) you want attendees to take away from this talk?
– How to go from idea to product
– How to iterate on a product
– How to go to production
– How to incorporate customer feedback

What is unique about this speech, from other speeches given on the topic?
Nothing super unique, but many startups of this size rarely share their experience building products and iterating. Many tutorials cover basics and not real business problems.

Abstract of Talk:
Research papers, blogs and products are the culmination of many hours of work, iteration and frustration. However, in these final polished formats, we often gloss over the iteration or creative process on how to get to our desired results.

In this talk, I’ll cover a series of short labs that mirror some of the challenges we’ve faced in building out our NLP models and algorithms. It will be an interactive session with a series of collaborative problem solving, and explanations of what we built and the process we took along the way.

Some of the twists and turns will include:
– Integrating a BERT based model with a message queue system
– Speeding up semantic search through vectorization
– Enabling multi lingual recommendations

Each member of the talk will have access to the code examples and will be encouraged to think beyond the challenges addressed and how they can apply some of our lessons learned to their own work.

Can you suggest 2-3 topics for post-discussion?
BERT based models and embeddings
Deploying models into production
NLP product development

Dr. Nasim Abdollahi

Postdoctoral Fellow, University of Toronto / Machine Learning Researcher, Cyclica

Nasim is a Postdoctoral Fellow at University of Toronto and a Machine Learning Researcher Intern at Cyclica, leading a collaborative project between Cyclica, University of Toronto and Vector Institute. She is the vice-chair of Engineering in Medicine and Biology Society of IEEE Toronto section. Nasim obtained her Ph.D. in electrical and computer engineering from University of Manitoba and has M.Sc. and B.Sc. in biomedical engineering. With her passion for developing and applying novel machine learning techniques for improving the quality of health care, she has conducted numerous research projects on enhancing biomedical imaging for breast cancer detection and monitoring. Her current research is focused on graph-based machine learning models that can predict proteins’ biological functions from their 3D atomic structures, with a promise to enhance designing novel medicines. Nasim is an advocate for women in STEM, serves as vice-chair of IEEE Canada Women in Engineering, and was recognized as a “Visionary Emerging Leader”.

Co-Presenter: Dr. Farnoosh Khodakarami

Workshop: Graph Neural Network Modeling in Drug Discovery Using PyTorch

Abstract: Graph Neural Networks (GNNs) have been among the most popular neural network architectures, and as graph is a natural representation for protein and molecule, GNNs have shown big sparks in graph-based ML modeling for drug discovery and protein science. Graph-based ML models can help us in identifying the topology of a protein structure from protein sequence, predicting protein’s biological functions from protein structure as well as identifying protein-protein and protein-drug interactions. In this workshop, we will have an introduction on Graph Neural Network (GNN) and its application in drug discovery followed by a code session on PyTorch Geometric, which is a great PyTorch library for building GNN models for structured data. We will then have a code-base session to walk you through two useful tools built with PyTorch Geometric: TorchDrug and NodeCoder.

What You’ll Learn: Audience will learn about:
– Graph Neural Network (GNN) in drug discovery
– How to build GNN with PyTorch Geometric
– TorchDrug – ML platform for drug discovery
– TorchProtein – a ML library for protein science
– NodeCoder – a graph-based ML framework for predicting proteins’ biological functions

Technical Level: 7

Location: Toronto

Workshop: Graph Neural Network Modeling in Drug Discovery Using PyTorch

Presenters:
Dr. Nasim Abdollahi, Postdoctoral Fellow at University of Toronto, Machine Learning Researcher at Cyclica & Dr. Farnoosh Khodakarami Computer Scientist & ML Researcher, Cyclica

About the Speaker:
Nasim is a Postdoctoral Fellow at University of Toronto and a Machine Learning Researcher Intern at Cyclica, leading a collaborative project between Cyclica, University of Toronto and Vector Institute. She is the vice-chair of Engineering in Medicine and Biology Society of IEEE Toronto section. Nasim obtained her Ph.D. in electrical and computer engineering from University of Manitoba and has M.Sc. and B.Sc. in biomedical engineering. With her passion for developing and applying novel machine learning techniques for improving the quality of health care, she has conducted numerous research projects on enhancing biomedical imaging for breast cancer detection and monitoring. Her current research is focused on graph-based machine learning models that can predict proteins’ biological functions from their 3D atomic structures, with a promise to enhance designing novel medicines. Nasim is an advocate for women in STEM, serves as vice-chair of IEEE Canada Women in Engineering, and was recognized as a “Visionary Emerging Leader”.

Farnoosh Khodakarami is an experienced computer scientist with a demonstrated history of working in the research industry. Skilled in application development with experience in machine learning applications. Strong research professional with a Doctor of Philosophy (Ph.D.) focused in Computer Science. Creative, self-motivated, and committed to working with a team-player attitude, great problem-solving skills, and the ability to quickly grasp new concepts.

Which talk track does this best fit into?
Workshop

Technical level of your talk?
(Technical level: 7/7)

Are there any industries (in particular) that are relevant for this talk?
Hospital & Health Care

What are the main core message (learning) you want attendees to take away from this talk?
Audience will learn about:
– Graph Neural Network (GNN) in drug discovery
– How to build GNN with PyTorch Geometric
– TorchDrug – ML platform for drug discovery
– TorchProtein – a ML library for protein science
– NodeCoder – a graph-based ML framework for predicting proteins’ biological functions

Abstract of Talk:
Graph Neural Networks (GNNs) have been among the most popular neural network architectures, and as graph is a natural representation for protein and molecule, GNNs have shown big sparks in graph-based ML modeling for drug discovery and protein science. Graph-based ML models can help us in identifying the topology of a protein structure from protein sequence, predicting protein’s biological functions from protein structure as well as identifying protein-protein and protein-drug interactions. In this workshop, we will have an introduction on Graph Neural Network (GNN) and its application in drug discovery followed by a code session on PyTorch Geometric, which is a great PyTorch library for building GNN models for structured data. We will then have a code-base session to walk you through two useful tools built with PyTorch Geometric: TorchDrug and NodeCoder.

Arthur Vitui

Senior Data Scientist Specialist Solution Architect, RedHat Canada

Arthur is a senior data scientist specialist solution architect at RedHat Canada where with the help of open source software is helping organizations develop intelligent application ecosystems and bring them into production using MLOps best practices.
He is also pursuing his Ph.D. degree in Computer Science at Concordia University, Montreal, Canada, and he is a research assistant in the Software Perfomance Analysis and Reliability (SPEAR) Lab.
His research interests are related to AIOps with focus on performance and scalability optimization.

Workshop: Open Source Intelligent Application Delivery on Kubernetes

Abstract: The recent rise in popularity of containerized workloads demanded better ways to orchestrate and manage these workloads hence the creation of the Kubernetes platform.

When it comes to running intelligent application workloads which contain built-in AI/ML software components, the requirement of a Kubernetes platform as a service extends beyond agility, portability, flexibility and scalability as it is required to also answer to the datascientist’s dilemma: getting started and getting into production.

However, as the ML code is only a small part of the entire intelligent application ecosystem, with this workshop we present a showcase for using a Kubernetes platform and a blueprint architecture that proposes an answer to many challenges related to the development, deployment and management of distributed applications.
The user stories we shall focus on in this workshop concerning the developer, data scientist and operations engineer personas are:
– As a data scientist, I want to develop ML models using Jupyter Hub (lab/notebooks) as my preferred research environment.
– As a data scientist, I want my model to be deployed quickly so that it may be used by other applications.
– As a (fullstack) developer, I want to have quick access to resources that support the business logic of my applications, including databases, storage, messaging.
– As a (fullstack) developer, I want an automated build process to support new releases/code updates as soon as they are available in a git repository.
– As an operations engineer, I want an integrated monitoring dashboard to new applications available on the (production) infrastructure.

What You’ll Learn: Open source container platforms are a great option to integrate Machine Learning with any application or service by boosting productivity while maintaining a high level of security.

Technical Level: 4

Location: Bossard

Workshop: Open Source Intelligent Application Delivery on Kubernetes

Presenter:
Arthur Vitui, Senior Data, Scientist Specialist Solution Architect, RedHat Canada

About the Speaker
Arthur is a senior data scientist specialist solution architect at RedHat Canada where with the help of open source software is helping organizations develop intelligent application ecosystems and bring them into production using MLOps best practices.
He is also pursuing his Ph.D. degree in Computer Science at Concordia University, Montreal, Canada, and he is a research assistant in the Software Perfomance Analysis and Reliability (SPEAR) Lab.
His research interests are related to AIOps with focus on performance and scalability optimization.

Which talk track does this best fit into?
Workshop

Technical level of your talk?
(Technical level: 4/7)

Are there any industries (in particular) that are relevant for this talk?
Computer Software, Hospital & Health Care, Information Technology & Service, Insurance

Who is this presentation for?
Senior Business Executives, Product Managers, Data Scientists/ ML Engineers and High-level Researchers, Product Managers, Data Scientists/ ML Engineers, ML Engineers, Researchers

What you’ll learn:
The audience will learn about what an intelligent application is, how to orchestrate its design, deployment and monitoring in a kubernetes environment. The audience will also learn about the datascientist dillema and how it may be addressed.

What are the main core message (learning) you want attendees to take away from this talk?
Open source container platforms are a great option to integrate Machine Learning with any application or service by boosting productivity while maintaining a high level of security.

Pre-requisite Knowledge:
Generic SDLC and basic Kubernetes knowledge

What is unique about this speech, from other speeches given on the topic?
Bringing in an enterprise perspective and enterprise ready Kubernetes platform beyond just a proof of concept (POC) although presenting a POC showcase for an end to end intelligent application.

Abstract of Talk:
The recent rise in popularity of containerized workloads demanded better ways to orchestrate and manage these workloads hence the creation of the Kubernetes platform.

When it comes to running intelligent application workloads which contain built-in AI/ML software components, the requirement of a Kubernetes platform as a service extends beyond agility, portability, flexibility and scalability as it is required to also answer to the datascientist’s dilemma: getting started and getting into production.

However, as the ML code is only a small part of the entire intelligent application ecosystem, with this workshop we present a showcase for using a Kubernetes platform and a blueprint architecture that proposes an answer to many challenges related to the development, deployment and management of distributed applications.
The user stories we shall focus on in this workshop concerning the developer, data scientist and operations engineer personas are:
– As a data scientist, I want to develop ML models using Jupyter Hub (lab/notebooks) as my preferred research environment.
– As a data scientist, I want my model to be deployed quickly so that it may be used by other applications.
– As a (fullstack) developer, I want to have quick access to resources that support the business logic of my applications, including databases, storage, messaging.
– As a (fullstack) developer, I want an automated build process to support new releases/code updates as soon as they are available in a git repository.
– As an operations engineer, I want an integrated monitoring dashboard to new applications available on the (production) infrastructure.

Can you suggest 2-3 topics for post-discussion?
– DataScientist Kubernetes Platform as a Service
– Automating builds and exposure of ML models inference endpoints

Jörg Schad

CTO, ArangoDB

Jörg Schad is CTO at ArangoDB and enjoys working in intersections of (Graph) databases, Cloud Architectures, and Machine Learning. In a previous life, he has worked on or built machine learning pipelines in healthcare, distributed systems at Mesosphere, and in-memory databases. He received his Ph.D. for research around distributed databases and data analytics. He’s a frequent speaker at meetups, international conferences, and lecture halls.

Workshop: Graph ML – The Next Level of Machine Learning

Abstract: This workshop focuses on why Graphs have become one of the biggest trends in Machine Learning. Graph Machine Learning based on Graph Analytic Algorithms is driving significant improvements in Fraud/Anomaly Detection, Ranking (Page Rank), Recommendation Engines (collaborative filtering), text summarization, and other NLP tasks. We will cover Graph Analytic Algorithms, their applications, and the more novel–but equally exciting–field of Graph Machine Learning, including topics such as Graph Neural Networks, Graph Embeddings, and applications of Graph Machine Learning.

The workshop will be hands-on based on Jupyter notebooks and cover sessions
– Why Graph and Graph Thinking
– Graph Algorithms
– Graph Embeddings
– Graph Neural Networks

What You’ll Learn: Graph Machine is considering relations (and neighborhood context) as first class citizens and hence can lead to more powerful and simplified Machine learning models.

Technical Level: 6

Location: Berlin/San Francisco

Workshop: Graph ML – The Next Level of Machine Learning

Presenter:
Jörg Schad, CTO, ArangoDB

About the Speaker:
Jörg Schad is CTO at ArangoDB and enjoys working in intersections of (Graph) databases, Cloud Architectures, and Machine Learning. In a previous life, he has worked on or built machine learning pipelines in healthcare, distributed systems at Mesosphere, and in-memory databases. He received his Ph.D. for research around distributed databases and data analytics. He’s a frequent speaker at meetups, international conferences, and lecture halls.

Which talk track does this best fit into?
Workshop

Technical level of your talk?
(Technical level: 5/7)

Are there any industries (in particular) that are relevant for this talk?
All industries

What are the main core message (learning) you want attendees to take away from this talk?
Machine learning is much more than just building models and the overall pipeline should be considered early on in order to result in actual business impact. Luckily there exist a number of Open-Source projects to help…

Abstract of Talk:
This workshop focuses on why Graphs have become one of the biggest trends in Machine Learning. Graph Machine Learning based on Graph Analytic Algorithms is driving significant improvements in Fraud/Anomaly Detection, Ranking (Page Rank), Recommendation Engines (collaborative filtering), text summarization, and other NLP tasks. We will cover Graph Analytic Algorithms, their applications, and the more novel–but equally exciting–field of Graph Machine Learning, including topics such as Graph Neural Networks, Graph Embeddings, and applications of Graph Machine Learning.

The workshop will be hands-on based on Jupyter notebooks and cover sessions
– Why Graph and Graph Thinking
– Graph Algorithms
– Graph Embeddings
– Graph Neural Networks

Presenter:
Jörg Schad, CTO, ArangoDB

About the Speaker:
Jörg Schad is CTO at ArangoDB and enjoys working in intersections of (Graph) databases, Cloud Architectures, and Machine Learning. In a previous life, he has worked on or built machine learning pipelines in healthcare, distributed systems at Mesosphere, and in-memory databases. He received his Ph.D. for research around distributed databases and data analytics. He’s a frequent speaker at meetups, international conferences, and lecture halls.

Which talk track does this best fit into?
Workshop

Technical level of your talk?
(Technical level: 5/7)

Are there any industries (in particular) that are relevant for this talk?
All industries

What are the main core message (learning) you want attendees to take away from this talk?
Machine learning is much more than just building models and the overall pipeline should be considered early on in order to result in actual business impact. Luckily there exist a number of Open-Source projects to help…

Abstract of Talk:
Many Machine Learning fail to turn an initial idea and potentially even first model into the business impact, as they neglect the importance (and associated work) of building a production-grade ML pipeline. There are many great tutorials for training your deep learning models using PyTorch, TensorFlow, Keras, Spark or one of the many other frameworks. But training is only a small part in the overall deep learning pipeline. This workshop gives an overview into building a complete automated deep learning pipeline starting with exploratory analysis, over training, model storage, model serving, meta-data storage, and monitoring using available Open-Source tool.

The participants will build an end-to-end data analytics pipeline including:
– Pipeline Orchestration
– Data preparation using Apache Spark
– Jupyter Notebooks
– Distributed training with TensorFlow
– Automation & CI/CD using Jenkins and Argo
– Model and metadata storage
– Model serving and monitoring

Mahmudul Hasan

Lead Data Scientist, TELUS Business Marketing

With deep expertise in Machine Learning and AI, Mahmudul has over 10 years industry experience of building enterprise level data products to achieve digital transformation, improve customer experience, new revenue opportunity, and cost savings for companies across the globe. He is currently serving as a Lead Data Scientist in TELUS Business Marketing. Mahmudul also designed and developed NLP course content for University of Toronto School of Continuing Studies and also serving as an instructor for the same.
Mahmudul holds a Master’s degree in Management Science from University of Waterloo and a Bachelor’s in Computer Science & Engineering.

Workshop: Introduction to NLP & a Step by Step Implementation of a Real World Use Case from TELUS

Abstract: The workshop will be delivered in two part:
– Part-1: Brief introduction to NLP concepts and ideas which would include
– Basic definitions and use cases
– Why NLP is a different ball game inside AI/ML (major challenges of processing natural language etc.)
– How those challenges are overcame with ML based approach
– Major workflow of building NLP application.
– Part-2: is a detail implementation of a case study with coding details which I have implemented in TELUS. During this part-2, audience will see how a business problem is solved leveraging unstructured text data using NLP algorithms along with necessary tips and tricks which makes a unsupervised learning based project financially successful for the company.

What You’ll Learn: Audience will see how a business problem is solved leveraging unstructured text data using NLP algorithms along with necessary tips and tricks which makes a unsupervised learning based project financially beneficial for the business.

Technical Level: 6

Location: Toronto

Workshop: Introduction to NLP & a Step by Step Implementation of a Real World Use Case from TELUS

Presenter:
Mahmudul Hasan, Lead Data Scientist, TELUS Business Marketing

About the Speaker:
With deep expertise in Machine Learning and AI, Mahmudul has over 10 years industry experience of building enterprise level data products to achieve digital transformation, improve customer experience, new revenue opportunity, and cost savings for companies across the globe. He is currently serving as a Lead Data Scientist in TELUS Business Marketing. Mahmudul also designed and developed NLP course content for University of Toronto School of Continuing Studies and also serving as an instructor for the same.
Mahmudul holds a Master’s degree in Management Science from University of Waterloo and a Bachelor’s in Computer Science & Engineering.

Which talk track does this best fit into?
Workshop

Technical level of your talk?
(Technical level: 6/7)

Are there any industries (in particular) that are relevant for this talk?
Computer Software, Marketing & Advertising, Telecommunications

Who is this presentation for?
Senior Business Executives, Product Managers, Data Scientists/ ML Engineers and High-level Researchers, Data Scientists/ ML Engineers

What you’ll learn:
The audience will have a real world case study of how unsupervised NLP algorithm can be successfully create values for a business, and some tips and tricks which make this kind of project successful for a data scientist

What are the main core message (learning) you want attendees to take away from this talk?
Audience will see how a business problem is solved leveraging unstructured text data using NLP algorithms along with necessary tips and tricks which makes a unsupervised learning based project financially beneficial for the business.

Pre-requisite Knowledge:
Some basic understanding on Data Science

What is unique about this speech, from other speeches given on the topic?
The audience will get an idea of how unstructured data can be converted to generate financially impactful benefits for business. Also will share some tips on how to make this kind of unsupervised learning based project a successful for a big corporation like TELUS.

Abstract of Talk:
The workshop will be delivered in two part:
Part-1: Brief introduction to NLP concepts and ideas which would include
– Basic definitions and use cases
– Why NLP is a different ball game inside AI/ML (major challenges of processing natural language etc.)
– How those challenges are overcame with ML based approach
– Major workflow of building NLP application.
Part-2: is a detail implementation of a case study with coding details which I have implemented in TELUS. During this part-2, audience will see how a business problem is solved leveraging unstructured text data using NLP algorithms along with necessary tips and tricks which makes a unsupervised learning based project financially successful for the company.

Can you suggest 2-3 topics for post-discussion?
1. What are the challenges of implementing a data science project in business?
2. how can you make your AIML project impactful for the business?

Workshop: Train your Models Faster by Learning How to Profile and Apply System-Level Optimizations

Presenters:
Akbar Nurlybayev, Co-Founder/VP of Engineering, CentML & Xin Li, Research Engineer, CentML & Yubo Gao, Research Engineer, CentML

About the Speakers:
Akbar is the Co-founder and VP of Engineering at CentML. Previously, Director of Data at KAR Global, $2 Billion publicly traded company. Yubo and Xin: PhD students at UofT Efficient Computing Lab

Xin Li is a former member of AI Technical Staff at Vector Institute’s AI Engineering team. Working within the vibrant community at Vector Institute, Xin collaborates with Vector researchers and industry partners to make Deep Learning research more accessible in applied settings. Currently, he is working as a Research Engineer at CentML

Yubo Gao has recently completed his undergraduate degree at the University of Toronto and have joined the EcoSystem lab as a PhD student, during which he is fortunate to be supervised by Prof. Gennady Pekhimenko.

Which talk track does this best fit into?
Workshop

Technical level of your talk?
(Technical level: 5/7)

What you’ll learn:
This workshop provides unique opportunities for attendees to learn and perform system-level optimizations for deep learning. This topic is often overlooked among ML practitioners due to time and resource constraints. This workshop strives to provide practitioners with some easy-to-use and practical tools to help them understand and optimize their workloads. This workshop also brings a unique perspective on the importance of hardware efficiency when working with Deep Learning models.

Abstract of Talk:
Everybody nowadays train models. Every year the size of the state-of-the-art models grows faster than the hardware becomes cheaper. We observed that many organizations significantly underutilize the available hardware accelerators, i.e. Nvidia GPUs, and as a result, are overpaying for both ML training and inference. In this workshop, our team of world-class ML Systems researchers will share various techniques and tools we use to profile and optimize deep learning models. We will demonstrate how the insights learned from the profiling can be used to discover optimization opportunities that make deep learning models utilize hardware more efficiently. This results in reduced training time, model iteration speed and ultimately lower cost for organizations.

Workshop: Distributed Training with PyTorch

Presenter:
Shagun Sodhani, Research Engineer, Meta AI

About the Speaker:
Research Engineer at Meta AI, previously at Mila and Adobe Research

Which talk track does this best fit into?
Workshop

Technical level of your talk?
(Technical level: 5/7)

What you’ll learn:
By the end of the session, the attendees would be able to take a simple PyTorch model and scale it to work with dozens of machines. For the straightforward use cases, this will require writing just a few lines of code.

Abstract of Talk:
PyTorch is one of the most popular ML frameworks with the recent releases focusing on enhanced support for distributed training. This talk discusses the different distributed training mechanisms provided by PyTorch. It should be helpful for both practitioners & researchers who want to train larger models and faster.

Hands-on Workshop: Introduction to Kubernetes for MLOps

Presenter:
Eric Hammel, MLOps Engineer, Rocket Science Development

About the Speaker:
A resourceful professional able to bridge skills between Data Science and Infrastructure (Cloud and HPC) to deliver valuable solutions. With experience in prototyping, deploying, and monitoring distributed workloads to drive an organization in translating real-life business problems into scalable data science solutions to generate value.

Which talk track does this best fit into?
Workshop

Technical level of your talk?
(Technical level: 5/7)

What you’ll learn:
The participants will get a crash course about Kubernetes and Cloud Native concepts. They will learn how to deploy an application on a managed kubernetes cluster.

Abstract of Talk:
Have you ever wondered what kubernetes and Cloud Native applications are?
Here is the perfect opportunity to get exposed to these complex yet powerful tools & conecepts.
You will discover Container Orchestration, Cloud Native applications, Kubernetes, and application deployment.

Workshop: Time Series Anomaly Detection with Machine Learning

Presenters: Benjamin Ye, Applied Research Scientists, Georgian & Angeline Yasodhara, Applied Research Scientists, Georgian

About the Speaker:
Applied Research Scientists

Which talk track does this best fit into?
Workshop

Technical level of your talk?
(Technical level: 5/7)

What you’ll learn:
Time series anomaly detection methods and applications

Abstract of Talk:
Traditional methods in time series anomaly detection yield good results for relatively simple tasks, but they often fall short when it comes to harder problems of dealing with long-range dependencies, multivariate time series, and subtle contextual anomalies. We introduce a toolkit incorporating classical and novel machine learning techniques (N-BEATS, Transformers, etc.) as well as recent thresholding methods to overcome these challenges.

We will discuss their benchmark results against different anomaly types for both univariate and multivariate cases. We will walk through how you can use this simple toolkit and easily incorporate these techniques into your application.

Workshop: Four Data and Analytics Initiatives and Strategy to Achieve Excellence

Presenters: Eric Huang, Founder & CEO, Advanced Analytics and Research Lab & Michael Woolfson, Client Lead & Development, Advanced Analytics and Research Lab

About the Speaker:
Eric Huang is the Founder and CEO of Advanced Analytics and Research Lab (AAARL.CA), a data science, analytic and AI services and solutions firm. The company helps organizations to fully streamline and utilize data to increase productivity, improve insights and ultimately achieve their goals. Eric has an undergraduate degree in Honors Business Administration, a Master of Science in Analytics from Ivey Business School, as well as an Honors Specialization in Economics from Western University. He has worked in various capacities in consulting, business development, finance, and academia, and has experiences teaching undergraduate and master level students in fun, engaging, and practical ways. Eric is a fun and friendly individual who loves to learn about everything in the world, and is also an avid coffee drinker, barista, photographer and volunteer.

Which talk track does this best fit into?
Workshop

Technical level of your talk?
(Technical level: 3/7)

What you’ll learn:
There are areas of blind spots where business and analytics meet, we will go through some common ones and how to resolve them. As well, for organizations new to analytics, some know how and some confidence into starting new analytics initiatives.

Abstract of Talk:
This talk will go through practical initiative to super charge your existing data and analytics strategies. As well, for those just starting out, frameworks around how to start an data and analytics function. We will go through the following topics: an introduction to areas of application in data and analytics for industry, establishing data and analytics strategies, setting up data and performance tracking, establishing key performance indicators, establishing a data drive culture.

Workshop: Testing for Fairness in AI HR Systems: Hidden Dangers and Real-World Lessons on How To Detect and Prevent Bias

Presenter:
Dan Adamson, CEO and Co-Founder, Armilla AI

About the Speaker:
Dan Adamson is the Co-Founder and CEO of Armilla.AI, a company helping institutions create trust in their AI. He co-founded PointChain Technologies, an AI-based neo-banking platform for high-risk industries and was Founder/CEO of OutsideIQ until its acquisition by Exiger, where he remained as their President overseeing product and cognitive computing research. OutsideIQ deployed AML and anti-fraud models to over 100 global financial institutions and built AI solutions for the HR and Insurance industries. He also previously served as Chief Architect at Medstory, a vertical search start-up acquired by Microsoft. Adamson holds several search algorithm and cognitive computing patents, and holds a Master of Science degree from U.C. Berkeley.

Which talk track does this best fit into?
Workshop

Technical level of your talk?
(Technical level: 3/7)

Abstract of Talk:
Testing for fairness in AI HR systems: hidden dangers and real-world lessons on how to detect and prevent biasAbstract: HR systems can perpetuate biases and represent a significant risk to organizations and harms to candidates. In this tutorial, we will review how to detect bias issues in HR systems, including resume screening and promotion models with Armilla, a QA for ML tool that is being used for formal assessments, including those under the new New York City bias law. We’ll look at hidden biases and common motifs that can cause these systems to fail, as well as suggestions for making these systems more robust.

Bhaskarjit Sarmah

Senior Data Scientist, BlackRock

Bhaskarjit is a data scientist and has solved business problems in many domains including Retail, FMCG, Banking, Media & Entertainment etc. using machine learning. Currently he is working as a data scientist BlackRock where he builds predictive models for financial markets. His research interests are Network Science, AI Interpretability, Uncertainty, NLP etc.

Workshop: Learning Embedded Representation of the Stock Correlation Matrix using Graph Machine Learning

Abstract: Understanding non-linear relationships among financial instruments has various applications in investment processes ranging from risk management, portfolio construction and trading strategies. Here, we focus on interconnectedness among stocks based on their correlation matrix which we represent as a network with the nodes representing individual stocks and the weighted links between pairs of nodes representing the corresponding pair-wise correlation coefficients. The traditional network science techniques, which are extensively utilized in financial literature, require handcrafted features such as centrality measures to understand such correlation networks.

However, manually enlisting all such handcrafted features may quickly turn out to be a daunting task. Instead, we propose a new approach for studying nuances and relationships within the correlation network in an algorithmic way using a graph machine learning algorithm called Node2Vec.

In particular, the algorithm compresses the network into a lower dimensional continuous space, called an embedding, where pairs of nodes that are identified as similar by the algorithm are placed closer to each other. By using log returns of S&P 500 stock data, we show that our proposed algorithm can learn such an embedding from its correlation network. We define various domain specific quantitative (and objective) and qualitative metrics that are inspired by metrics used in the field of Natural Language Processing (NLP) to evaluate the embeddings in order to identify the optimal one. Further, we discuss various applications of the embeddings in investment management.

What You’ll Learn: In this paper we have shown how to create stock embedding representation from stock correlation matrix. And evaluated the learnt embeddings using a quantitative way

Pre-requiste Knowledge: Network Science, Machine Learning, Word Embeddings

Technical Level: 5

Location: Delhi

Workshop: Learning Embedded Representation of the Stock Correlation Matrix using Graph Machine Learning

Presenter:
Bhaskarjit Sarmah, Senior Data Scientist, BlackRock

About the Speaker:
Bhaskarjit is a data scientist and has solved business problems in many domains including Retail, FMCG, Banking, Media & Entertainment etc. using machine learning. Currently he is working as a data scientist BlackRock where he builds predictive models for financial markets. His research interests are Network Science, AI Interpretability, Uncertainty, NLP etc.

Which talk track does this best fit into?
Research: Advanced Technical.

Technical level of your talk?
(Technical level: 4 /7)

Are there any industries (in particular) that are relevant for this talk?
Banking & Financial Services, Information Technology & Service, Insurance, Marketing & Advertising

Who is this presentation for?
Senior Business Executives, Product Managers, Data Scientists/ ML Engineers and High-level Researchers, Product Managers, Data Scientists/ ML Engineers, ML Engineers, Researchers

What you’ll learn:
In this paper we have shown how to create stock embedding representation from stock correlation matrix. And evaluated the learnt embeddings using a quantitative way.

What are the main core message (learning) you want attendees to take away from this talk?
How to represent financial securities in form of embeddings using graph machine learning

Pre-requisite Knowledge:
Network Science, Machine Learning, Word Embeddings

What is unique about this speech, from other speeches given on the topic?
This speech is centered around feature extraction from networks. In this speech, will first introduce the traditional hand crafted feature extraction technique from networks. And then will explain how we can use graph machine learning for automatic feature extraction in the form embeddings. And how to evaluate those embeddings in quantitative way.

Abstract of Talk:
Understanding non-linear relationships among financial instruments has various applications in investment processes ranging from risk management, portfolio construction and trading strategies. Here, we focus on interconnectedness among stocks based on their correlation matrix which we represent as a network with the nodes representing individual stocks and the weighted links between pairs of nodes representing the corresponding pair-wise correlation coefficients. The traditional network science techniques, which are extensively utilized in financial literature, require handcrafted features such as centrality measures to understand such correlation networks. However, manually enlisting all such handcrafted features may quickly turn out to be a daunting task. Instead, we propose a new approach for studying nuances and relationships within the correlation network in an algorithmic way using a graph machine learning algorithm called Node2Vec. In particular, the algorithm compresses the network into a lower dimensional continuous space, called an embedding, where pairs of nodes that are identified as similar by the algorithm are placed closer to each other. By using log returns of S&P 500 stock data, we show that our proposed algorithm can learn such an embedding from its correlation network. We define various domain specific quantitative (and objective) and qualitative metrics that are inspired by metrics used in the field of Natural Language Processing (NLP) to evaluate the embeddings

Can you suggest 2-3 topics for post-discussion?
Node2Vec, Stock Embeddings, Network Science

Workshop: Building a Fraud Detection Model with Feature Stores (Includes Bonus Case Study: How Shopify uses Feast to Manage its ML Features)

Presenters:
Danny Chiao, Tech Lead, Feast & Eddie Esquivel, Sr. Solutions Architect, Tecton & Abhin Chhabra, ML Platform Tech Lead, Shopify

About the Speakers:
Danny Chiao is an engineering lead at Tecton/Feast Inc working on building a next-generation feature store. Previously, Danny was a technical lead at Google working on end to end machine learning problems within Google Workspace, helping build privacy-aware ML platforms / data pipelines and working with research and product teams to deliver large-scale ML powered enterprise functionality. Danny holds a Bachelor’s degree in Computer Science from MIT. |

Eddie Esquivel is a Solutions Architect at Tecton, where he helps customers implement feature stores as part of their stack for Operational ML. Prior to Tecton, Eddie was a Solutions Architect at AWS.

Abhin leads the feature store team for Shopify’s ML Platform.

Which talk track does this best fit into?
Workshop

Technical level of your talk?
(Technical level: 4/7)

Who is this presentation for?
Product Managers, Data Scientists/ ML Engineers, ML Engineers

What you’ll learn:
You will learn how to:
– Build new features
– Automate the transformation of batch data
– Automate the transformation of streaming and real-time data
– Create training datasets
– Serve data online using DynamoDB or Redis
– Build fraud detection system using Tecton and Feast

Pre-requisite Knowledge:
Attendees should have functional knowledge of Python, SQL and Spark, as well as familiarity with the challenges of data engineering for ML.

What is unique about this speech, from other speeches given on the topic?
Danny and Eddie are core members of the Feast and Tecton Engineering and Solutions Architect teams. They have deep expertise in working with dozens of end-users to build real-time recommendation systems using feature stores. They also have a lot of experience working on ML infrastructure at Google, AWS, and Tecton.

Abstract of Talk:
In this workshop, we’ll show how to build a real-time fraud detection system using some of the latest tooling for managing ML data pipelines. We’ll walk through the process of building, deploying, and serving real-time data pipelines, highlighting the differences between a traditional feature store (using Feast, the open source feature store) and a feature platform (using Tecton).

We’ll present common architectural patterns and walk you through building a model in three stages:
– Batch, daily computed predictions
– Online predictions using batch features
– Online predictions using real-time features

Can you suggest 2-3 topics for post-discussion?
– Best practices for ML recommendation systems
– Building streaming and real-time data pipelines for ML
– Feature Stores: have you implemented one? Let’s share learnings

Annie En-Shiun Lee

Assistant Professor, University of Toronto

Annie En-Shiun Lee is an Assistant Professor (Teaching Stream) for the Computer Science Department at the University of Toronto. She received her PhD from the University of Waterloo in 2014 under the supervision of Professor Andrew K. C. Wong and Daniel Stashuk from the Centre of Pattern Intelligence and Machine Intelligence. She has also been a visiting researcher at the Fields Institute (invited by Nancy Reid) and CUHK (invited by K. S. Leung and M. H. Wong) as well as a research scientist at VerticalScope and Stradigi AI.

Workshop: Pre-Trained Multilingual Sequence-to-Sequence Models for NMT: Tips, Tricks and Challenges

Abstract: Neural Machine Translation (NMT) has seen a tremendous spurt of growth in less than ten years, and has already entered a mature phase. Pre-trained multilingual sequence-to-sequence (PMSS) models, such as mBART and mT5, are pre-trained on large general data, then fine-tuned to deliver impressive results for natural language inference, question answering, text simplification and neural machine translation. This tutorial presents 1) An Introduction to Sequence-to-Sequence Pre-trained Models, 2) How to adapt pre-trained models for NMT, 3) Tips and Tricks for NMT training and evaluation, 4) Challenges/Problems faced when using these models. This tutorial will be useful for those interested in NMT, from a research as well as industry point of view.

What You’ll Learn: This tutorial will give an overview of Pre-trained Sequence-to-Sequence Multilingual Models, tips, tricks and frameworks that can be used to adapt these models for NMT especially for low resource languages and the challenges faced while using these models and how to overcome them.

Technical Level: 5

Location: Toronto

Workshop: Pre-Trained Multilingual Sequence-to-Sequence Models for NMT: Tips, Tricks and Challenges

Presenter:
Annie En-Shiun Lee, Assistant Professor, University of Toronto

About the Speaker:
Annie En-Shiun Lee is an Assistant Professor (Teaching Stream) for the Computer Science Department at the University of Toronto. She received her PhD from the University of Waterloo in 2014 under the supervision of Professor Andrew K. C. Wong and Daniel Stashuk from the Centre of Pattern Intelligence and Machine Intelligence. She has also been a visiting researcher at the Fields Institute (invited by Nancy Reid) and CUHK (invited by K. S. Leung and M. H. Wong) as well as a research scientist at VerticalScope and Stradigi AI.

Which talk track does this best fit into?
Workshop

Technical level of your talk?
(Technical Level: 5/7)

What you’ll learn:
This tutorial will give an overview of Pre-trained Sequence-to-Sequence Multilingual Models, tips, tricks and frameworks that can be used to adapt these models for NMT especially for low resource languages and the challenges faced while using these models and how to overcome them.

Abstract of Talk:
Neural Machine Translation (NMT) has seen a tremendous spurt of growth in less than ten years, and has already entered a mature phase. Pre-trained multilingual sequence-to-sequence (PMSS) models, such as mBART and mT5, are pre-trained on large general data, then fine-tuned to deliver impressive results for natural language inference, question answering, text simplification and neural machine translation. This tutorial presents 1) An Introduction to Sequence-to-Sequence Pre-trained Models, 2) How to adapt pre-trained models for NMT, 3) Tips and Tricks for NMT training and evaluation, 4) Challenges/Problems faced when using these models. This tutorial will be useful for those interested in NMT, from a research as well as industry point of view.

Workshop: An Introduction to Drift Detection

Presenters:
Ed Shee, Head of Developer Relations, Seldon & Ashley Scillitoe, Data Science Research Engineer, Seldon

About the Speakers:
With a background in cloud computing and a passion for machine learning, Ed has combined those skills and now works in the MLOps field where he heads up Developer Relations at Seldon. Organizer of Tech Ethics London and MLOps London, Ed is heavily involved in lots of developer communities and, thankfully, loves both beer and pizza.

Ashley is a data science research engineer at Seldon, where he works on developing production-ready tools for drift, adversarial and outlier detection. Prior to joining Seldon, he spent a number of years as a Research Fellow at The Alan Turing Institute. Here, he explored the use of machine learning for tackling aerospace engineering problems, with a focus on explainability and uncertainty quantification. Ashley also completed a PhD at the University of Cambridge, and is a keen proponent of open-source software.

Which talk track does this best fit into?
Workshop

What you’ll learn:
What drift detection is, why it’s important and how to get started.

Pre-requisite Knowledge:
No prior knowledge or understanding of drift detection is required (we’ll be covering that) but a basic knowledge of machine learning and some experience with Python will be helpful.

Abstract of Talk:
Although powerful, modern machine learning models can be sensitive. Seemingly subtle changes in a data distribution can destroy the performance of otherwise state-of-the art models, which can be especially problematic when ML models are deployed in production. In this workshop, we will give a hands-on overview to drift detection, the discipline focused on detecting such changes. We will start by building an understanding of the ways in which drift can occur, and why it pays to detect it. We’ll then explore the anatomy of a drift detector, and learn how they can be used to detect drift in a principled manner.

You will work through a real-world example using Alibi Detect, an open-source Python library offering powerful algorithms for adversarial, outlier and drift detection.You’ll learn how to set-up drift detectors, and deduce what type of drift is occurring. Since data can take many forms, such as image, text or tabular data, you’ll explore how to use existing ML models to pre-process your data into a form suitable for drift detectors. Then, to gain further insights into the causes of drift, you’ll employ state-of-the art detectors which are able to perform fine-grained attribution to instances and features. To assess whether model performance has been affected by drift, you’ll experiment with using model uncertainty based detectors. Finally, you’ll use a novel context-aware drift detector. This takes in context (or conditioning) variables, allowing you to test for drift conditional on context that is permitted to change. We’ll discuss how this functionality can be crucial in many real-world drift detection scenarios.

This hands-on workshop is targeted at a beginner-intermediate level. No prior knowledge or understanding of drift detection is required (we’ll be covering that) but a basic knowledge of machine learning and some experience with Python will be helpful.

The workshop will be hands-on based on Jupyter notebooks and cover sessions
– Why Graph and Graph Thinking
– Graph Algorithms
– Graph Embeddings
– Graph Neural Networks

Workshop: Cancer Image Segmentation

Presenters:
Moderator (Roxana Sultan, Chief Data Officer and VP, Health, Vector Institute)
Dr. Benjamin Haibe-Kains, Senior Scientist, University Health Network
Team Fight Tumour (Jun Ma, Postdoctoral Fellow, Vector Institute / Ronald Xie, PhD Candidate, Vector Institute / Rex Ma, PhD Candidate, Vector Institute)

About the Speakers:
Roxana Sultan: Roxana Sultan is the Chief Data Officer and Vice President, Health at the Vector Institute. She leads Vector’s data strategy and its contributions to Ontario’s and Canada’s health sector. Along with our health team and partners, Roxana drives applications of AI to life sciences, fostering research, health sector and industrial sponsor projects, and initiatives to advance the health space, contributing to short-, medium-, and long-term impact achievements within the Ontario health ecosystem.
Roxana is the former Executive Director of the Provincial Council for Maternal and Child Health, where she led the implementation of evidence-based clinical quality improvement and access initiatives in obstetric, neonatal, and pediatric health services across Ontario. Her career includes leadership roles with The Hincks-Dellcrest Centre (now “SickKids Centre for Community Mental Health”), the Princess Margaret Cancer Centre in the University Health Network, the Canadian Institutes of Health Research (CIHR), Cancer Care Ontario, and the Hospital for Sick Children.
As an Adjunct Lecturer with the Institute of Health Policy, Management, and Evaluation (IHPME) at the University of Toronto (U of T), Roxana teaches a graduate-level course on intelligent medicine, machine learning, and knowledge representation. She also serves as the Vice Chair of the Board of the Canadian Cancer Society – Ontario Division.
Roxana completed her graduate education with the Department of Molecular and Medical Genetics at U of T, and holds a Masters of Health Science from IHPME.

Dr. Benjamin Haibe-Kains: Dr. Benjamin Haibe-Kains is a Senior Scientist at the Princess Margaret Cancer Centre: University Health Network, Associate Professor in the Medical Biophysics department of the University of Toronto, and Faculty Affiliate at the Vector Institute. Dr. Haibe-Kains earned his PhD in Bioinformatics at the Université Libre de Bruxelles (Belgium). Supported by a Fulbright Award, he did his postdoctoral fellowship at the Dana-Farber Cancer Institute and Harvard School of Public Health (USA). Dr. Haibe-Kains’ research focuses on the integration of high-throughput data from various sources to simultaneously analyze multiple facets of carcinogenesis. Dr. Haibe-Kains’ team is analyzing large-scale radiological and (pharmaco)genomic datasets to develop new prognostic and predictive models to improve cancer care.

Jun Ma is a Postdoctoral Fellow in the Department of Laboratory Medicine & Pathobiology at the University of Toronto. His research interests focus on the interdisciplinary areas of deep learning and medical image analysis, aiming to develop accurate, fast, and generalizable algorithms to improve healthcare. He has published seven first-author papers on top journals, such as TPAMI, TMI, and MedIA. He is the lead organizer of the MICCAI 2021-2022 FLARE Challenge.

Ronald Xie received his BSc in Microbiology and Immunology at the University of British Columbia in 2018. He then received his MPhil in Computational Biology at the Department of Applied Mathematics and Theoretical Physics at the University of Cambridge in 2019. Ronald is currently a PhD candidate in Computational Biology and Molecular Genetics (CBMG) at the Faculty of Medicine at University of Toronto. His research interests lie in deep learning applications in electron microscopy and single cell omics.

Rex Ma is currently a Computer Science Ph.D. student at the University of Toronto. He is interested in AI in healthcare and computation biology in general, with research focusing on multi-omics integration using machine learning.

Technical level of your talk?
(Technical Level: 5/7)

Pre-requisite Knowledge:
Basic ML

Abstract of Talk:
Radiation therapy planning for head and neck cancer is a time-consuming and complex task for radiologists. AI-based tools have tremendous potential for segmenting regions of interest and optimizing therapy planning. The Vector Institute and the Cancer Digital Intelligence Program (CDI) from the Princess Margaret Cancer Centre launched a Machine Learning Challenge in June 2022 focused on cancer image segmentation.

Building on foundational work from the lab of scientist, Dr. Benjamin Haibe-Kains, ten teams from Vector and UHN participated in the Challenge leveraging RADCURE, the largest head-and-neck cancer treatment dataset of its kind, containing the imaging, treatment, demographic and clinical data of 2745 head and neck cancer patients.

In this presentation, moderated by Vector’s Roxana Sultan, Dr Haibe Kains will provide an overview of his preliminary work and the winning Challenge team, Fight Tumour, will describe their winning submission.

Workshop: De-Risk Your AI Efforts by Removing Friction From Your MLOp Processes

Presenters:
Catalina Herrera, Principle Sales Engineer, Dataiku & Chris Helmus, Senior Sales Engineer, Dataiku

About the Speakers:
With a passion for data and analytics, Catalina Herrera has spent her entire career helping the industry push beyond digitalization to business transformation. She’s held both educational and technical positions, worked with state-of-the-art technology solutions across multiple industry verticals, and served as a data scientist and advanced analytics consultant. Today she works with Fortune 100 companies and global technology leaders on digital transformation initiatives.

Chris Helmus has spent his career helping people and organizations embrace self-service analytics and machine learning. His expertise spans from enabling business users to become data experts to MOps at scale with a focus on enabling collaboration. When he’s not working with data you can find Chris at music events in the Denver area.

Which talk track does this best fit into?
Workshop

Technical level of your talk?
(Technical level: 6/7)

What you’ll learn:
In this session, you’ll learn how Dataiku’s MLOps framework can help you to:
– Increase agility and solve difficulties in handoffs between business, data scientists, and IT
– Make your models trusted from the get go (and, therefore, reduce risk)
– Apply model control and approvals to enable, not disable, your AI projects

Pre-requisite Knowledge:
Understanding of MLOps

Abstract of Talk:
According to McKinsey, building ML into processes enables leading organizations to increase their process efficiency by 30% or more while also increasing revenues by up to 10%. However, it’s not that simple. Several blockers prevent organizations from overcoming the difficulties encountered when industrializing AI. As a result, it can take up to nine months for teams to go from the proof of concept stage to production. In this context, how do you remove friction from your MLOps process and make your model processes trusted, agile, and controlled, so that you can finally deliver more value from your analytics and model faster?

Workshop: Introducing the Tenstorrent Model Zoo

Presenter:
Milan Kordic, Senior Machine Learning Engineer, Tenstorrent

About the Speaker:
Milan is a Senior Machine Learning Engineer at Tenstorrent and a member of the Customer Success team. His role is to support Tenstorrent customers and the community of ML developers using Tenstorrent hardware to successfully build and deploy their AI solutions. With an educational background in Electrical and Computer Engineering and past work experiences as a Machine Learning Engineer, Data Scientist, and Analytics Engineer, Milan has strong knowledge of AI / ML systems and computer hardware architecture.

Which talk track does this best fit into?
Workshop

Technical level of your talk?
(Technical level: 4/7)

What you’ll learn:
– Overview of Tenstorrent’s hardware product line
– End-to-end overview of Tenstorrent’s software stack
– Stages of an AI model from pre-training, fine-tuning, evaluation and inference
– Hands-on demo of the Tenstorrent Model Zoo

Pre-requisite Knowledge:
– Knowledge of deep neural network models used for applications such as NLP and computer vision
– Knowledge of machine learning model training and inference
– Knowledge of computer hardware such as CPU, GPU, AI accelerators, etc. is helpful, but not required

Abstract of Talk:
Tenstorrent AI accelerator hardware is specially designed to accelerate artificial intelligence and machine learning applications, competing on a performance-per-dollar basis. For developers and engineers, having access to efficient AI computing power and an easy-to-use software API is critical for running large-scale and compute-intensive models such as BERT, GPT3, BART, ResNet50, and T5. In this workshop, we will introduce the Tenstorrent developer ecosystem including an overview of the hardware product line, the end-to-end software stack including BUDA, PyBUDA and Model Zoo, the stages of an AI model from pre-training, fine-tuning, evaluation and inference, and a hands-on demo of the Tenstorrent Model Zoo highlighting the key steps developers need to take to get their model running on Tenstorrent AI hardware.

Workshop: Troubleshooting your ML Models in Production

Presenter:
Amber Roberts, Machine Learning Engineer, Arize AI

About the Speaker:
Amber Roberts is a community-oriented Machine Learning Engineer at Arize AI, an ML observability company. Amber’s role at Arize looks to help teams across all industries build ML Observability into their productionalized AI environments. Previously, Amber was a product manager of AI at Splunk and the Head of Artificial Intelligence at Insight Data Science. A Carnegie Fellow, Amber has an MS in Astrophysics from the Universidad de Chile.

Which talk track does this best fit into?
Workshop

Technical level of your talk?
(Technical level: 3/7)

What you’ll learn:
In this workshop, you’ll learn best practices for how to:
– Account for model, feature and actuals drift to ensure your models stay relevant
– Troubleshoot performance degradations across various cohorts
– Avoid common pitfalls from misleading evaluation metrics to imbalanced datasets

Abstract of Talk:
Taking a model from research to production is hard — and keeping it there is even harder! As more machine learning models are deployed into production, it is imperative to have tools to monitor, troubleshoot, and explain model decisions. In this workshop attendees will implement ML observability firsthand in the Arize platform to see if their fraud model is drifting, underperforming, and/or exhibiting bias. Participants will monitor, surface, resolve, and improve performance on ML models in production.

Workshop: Automating Knowledge Work with Generative AI

Presenter:
Tristan Zajonc, Co-Founder, Continual

About the Speaker:
Tristan is the cofounder of Continual, a startup focused on enabling pervasive operational AI within the enterprise. He was previously the CTO for Machine Learning at Cloudera.

Which talk track does this best fit into?
Workshop

Technical level of your talk?
(Technical level: 4/7)

What you’ll learn:
– An overview of the state-of-the-art of generative AI
– A hands on experience using generative AI to automate knowledge work.

Abstract of Talk:
With the emergence of large generative AI models such as GTP3, DallE2, and Stable Diffusion, generative AI is set to revolutionize knowledge work over the next few years. However applying these models to solve real world business problems remains a challenge due to the need to align models with human preferences, orchestrate models to address complex use cases, and augment models with human feedback and control. This workshop will provide an overview of the current state of generative AI and a hands on experience using generative AI to automate knowledge work.

Workshop: Building Automated Model Life Cycles To Show Data Science Business Contribution, Minimize the Impact of Regulation and Governance Requirements, and Keep the Freedom of Innovation

Presenter:
Jim Olsen Chief Technology Officer, ModelOp

About the Speaker:
Jim Olsen serves as Chief Technology Officer at ModelOp where he leads the technical innovation and design of the ModelOp Center platform. Jim is also integral to advising ModelOp customer CIOs and CTOs on requirements to better support their IT operations as they execute on digital business strategies that often strain technology infrastructure.

Which talk track does this best fit into?
Workshop

Technical level of your talk?
(Technical level: 4/7)

What you’ll learn:
Basics of a model life cycle: What makes up a model life cycle and how do you design one
Governance: Developing an automated governance workflow
Monitoring: How to monitor models post-deployment in a flexible manner
Remediation: Creating remediation workflows that track and accelerate time to resolution

Pre-requisite Knowledge:
Will teach all skills, but some understanding of flow charts is helpful

Abstract of Talk:
In this session, ModelOp CTO Jim Olsen shows you how to design and build a model life cycle, including how to incorporate Industry best practices as well as provides considerations for creating the model life cycle, who should be involved, and the types of issues that must be considered.

Workshop: Observability is Critical to MLOps

Presenter:
Marcelo Litovsky Director of Sales Engineering, Aporia

About the Speaker:
Marcelo Litovsky is an experienced Information Technology professional with 30 years of diverse background in Enterprise Architecture, AI, Systems and Database Management, and Programming. He has worked in multiple industries: Financial Services, Entertainment, and Information Technology in his career. Today, he serves as Director of Sales Engineering at Aporia, bringing his expertise to help Data Scientists, Machine Learning Engineers, and Business Users work together to unlock and promote the business value of their machine learning models. You can find him at the gym, preparing healthy vegan meals when he is not talking to customers or writing Python code.

Which talk track does this best fit into?
Workshop

Technical level of your talk?
(Technical level: 4/7)

What you’ll learn:
This session will explore the steps you can take to prepare for ML observability. We will also discuss how observability helps data scientists and MLOps practitioners showcase the business value of the applications they deploy and get recognition for their hard work.

Abstract of Talk:
MLOps is bringing a lot of attention to the business impact of Machine Learning. It also introduces new challenges that cannot be efficiently addressed with DevOps. What are these challenges, and what makes MLOps so different from DevOps? They both deal with the life cycle of an application, so what is the difference? Most software applications have a pre-defined behavior. We know the data going in, and we know the data going out. Anything not matching a predefined format or schema is a problem. Machine Learning models follow the same pattern to operate, but their value diminishes as the content of the data changes. We are looking at the schema, format, and patterns describing a change in the data. This is the big difference between DevOps and MLOps, observing the data.

Most organizations have focused on the simplification, automation, and scalability of Machine Learning applications. Observability has taken a back seat. This session will explore the steps you can take to prepare for ML observability. We will also discuss how observability helps data scientists and MLOps practitioners showcase the business value of the applications they deploy and get recognition for their hard work.

Workshop: Bringing An AI System From Proof of Concept to Deployment and Beyond

Presenter:
James Cameron, Senior AI/ML Solutions Architect, NVIDIA

About the Speaker:
James is a Senior Solutions Architect from Nvidia where he works with companies to design, develop, and deploy their AI systems on the edge or in the data center. Previously he was a Team Lead at Patriot One Technologies where he designed and deployed many production AI/ML systems.

Which talk track does this best fit into?
Workshop

Technical level of your talk?
(Technical level: 5/7)

What you’ll learn:
Whether they are building their first MVP or scaling out to an entire data center, attendees will come away with a better understanding of all stages of bringing an AI system from the lab into the field. Code samples will be shared for performance tuning AI models, system monitoring, and inference serving at scale.

Abstract of Talk:
With more and more companies looking to improve their products and businesses with AI, machine learning engineering is becoming an important task in moving data science from the R&D lab to the field. This workshop will walk through the various stages of creating a production grade AI system, including creating an MVP, scaling/growing systems, and performance tuning. Real world lessons will be shared as tips and tricks around common pitfalls such as sizing hardware requirements, meeting latency targets, and developing MLOps procedures and systems.

Workshop: A Guide to Putting Together a Continuous ML Stack

Presenter:
Kallie Levy, Software Engineer, Superwise

About the Speaker:
Kallie Levy is an ML and data engineer. She started out working on a data-intensive, near-real-time system for the Israeli Defense Forces. Her greatest dev passions are around high-scale data ingestion and handling data lake and warehouse architecture. Currently, she works as a software engineer at Superwise, an end-to-end machine learning observability platform, and currently works on the development of the system’s entire data lake infrastructure.
In her free time, she likes to play sports, especially soccer!

Which talk track does this best fit into?
Workshop

Technical level of your talk?
(Technical level: 4/7)

What you’ll learn:
See an example of an ML pipeline implementation using Flyte
Deploy model to an endpoint
Define monitoring policies (include some best practices)
Trigger ML pipeline to create a new model based on fresh data

Abstract of Talk:
We’ll take a hands-on dive into implementing the 1st level of MLOps maturity and performing continuous training of the model by automating our ML pipeline. We’ll start with the ML pipeline and see how we can detect performance degradation and data drift in order to trigger the pipeline and create a new model based on fresh data.

Workshop: Launching Scotiabank's Customer Facing Chatbot for a Large Organization: From Cold Start Problem to Implementation

Presenters:
Rafal Orlowski, Director, Data Science, Scotiabank & Fabio Dutra Sarti, Senior AI/ML Product Manager, Scotiabank

About the Speakers:
Rafal Orlowski is the Director of Data Science at Scotiabank on the Corporate Functions Analytics and AI/ML Solutions team supporting various AI and ML initiatives across the bank and it’s subsidiaries. He been a part Scotiabank for over 5 years and has worked on a variety of projects in digital, fraud, AML and mobile banking. He has nearly 10 years of hands on experience in Data Science and holds a Masters from University of Toronto in Economics.

Fabio Dutra Sarti is the Sr. AI/ML Product Manager at Scotiabank on the Corporate Functions Analytics and AI/ML Solutions team supporting the development of the Scotiabank chatbot for the past 1.5 years. Prior to that, he spent 2 years scaling Juliet, Westjet’s virtual assistant, helping hundreds of thousands of customers. Fabio’s experience also includes launching a crypto currency exchange in Brazil and a real estate start-up in Boston. He holds a Master of Advanced Management degree from Yale School of Management

Which talk track does this best fit into?
Workshop

Technical level of your talk?
(Technical level: 2/7)

What you’ll learn:
Examples of application of ML in healthcare; Impact of ML in healthcare technologies on patients; Potential biases in ML in Healthcare technologies

Abstract of Talk:
As organizations grow in size and complexity they are increasingly leveraging AI to resolve customer inquiries. The following talk outlines how Scotiabank built an in house chatbot solution from early strategic planning to launching to customers this year. First, the talk will highlight how a team of data scientists used data to prioritize intents and create a repository of training and testing utterances as a foundation for the NLU(Natural Language Understanding). Second, it will also show case the collaboration and engagement models between business, product, engineering, content, design, and accessibility to ensure that the chatbot delivers a dynamic conversational experience. Lastly, the talk will highlight how the data science team is leveraging NLP and ML to diagnose the health of the chatbot and identify new topics/data to train the chatbot on.

Workshop: ML Experimentation with DVC and VS Code

Presenter:
Alex Kim, Solutions Engineer, Iterative.ai

About the Speaker:
Alex Kim is a Solutions Engineer at Iterative. His background is in physics, software engineering, and machine learning. In the last couple of years, he became increasingly interested in the engineering side of ML projects: processes and tools needed to go from an idea to a production solution.

Which talk track does this best fit into?
Workshop

Technical level of your talk?
(Technical level: 4/7)

What you’ll learn:
Learn how to generate many reproducible ML experiments without leaving the context of their IDE

Abstract of Talk:
Learn how to manage and make your machine learning projects reproducible with an open-source tool DVC and its extension for VS Code.
We will see how to track datasets and models, run, compare, visualize, and track machine learning experiments right in VS Code IDE.

Workshop: Building AI Applications with Transformers

Presenters:
Rajiv Shah, Machine Learning Engineer, Hugging Face & Andrew Jardine, Enterprise Account Executive, Hugging Face

About the Speakers:
Rajiv Shah is a leading expert on practical AI. At Hugging Face, his primary focus is on enabling enterprises to succeed with AI. He previously led data science enablement efforts across hundreds of data scientists at DataRobot and has been part of data science teams at Snorkel AI, Caterpillar, and State Farm.
He is a widely recognized speaker on AI, has received many patents, and published research papers in several domains, including sports analytics, deep learning, and interpretability. He received a Ph.D. and a J.D. from the University of Illinois at Urbana Champaign.

Andrew is an Account Executive at Hugging Face where he helps enterprise customers understand how to leverage the 🤗 open-source resources to build state of the art ML. Outside of Hugging Face Andrew is the Toronto chapter lead for MLOps.Community and has a background in NLP, MLOps and engineering.

Which talk track does this best fit into?
Workshop

Technical level of your talk?
(Technical level: 5/7)

What you’ll learn:
It’s easy to get start building advanced AI applications.

Abstract of Talk:
Transformers have ushered in some of the most innovative and exciting AI technologies, like Dalle and Github’s Copilot. Rajiv shows you how to use open-source tools and models to solve use cases like auto-completion, semantic search, and document AI. He covers the power of embeddings, the emergence of Generative AI, and using transfer learning. He will end by touching on emerging trends around multimodal, multi-task, and large language models. The talk will also incorporate a notebook, code snippets, and paper references.

Talk: Model Lifecycle in Banking

Presenter:
Sarah Sun, Director Data Science, Scotiabank

About the Speaker:
TBD

Which talk track does this best fit into?
Lightning Ignite Talk

Abstract of Talk:
Everything you ever wanted to know about model building at a bank, from conception to implementation, and then some!!

Talk: Deploying a Machine Learning Model in under 15 Minutes at Wealthsimple

Presenter:
Mandy Gu, Engineering Manager, Wealthsimple

About the Speaker:
Mandy leads the Machine Learning Platform and the Data Platform teams at Wealthsimple. Prior to working on ML-ops and infrastructure, she was a NLP researcher in two separate conversational AI roles and a data scientist building models for the operations and client experience spaces.

Which talk track does this best fit into?
Lightning Ignite Talk

Abstract of Talk:
From powering money movement to fraud detection, machine learning models are critical to Wealthsimple’s core business process. This year, we built our next generation Machine Learning platform with a simple goal in mind: deploy new ML models within HOURS. This is how we scoped, designed and built our platform in just under 3 months.

Talk: What To Look For In Your Next ML Pipeline

Presenter:
Bhavani Rao, Technical Product Marketing Manager, Pachyderm

About the Speaker:
Bhavani Rao is a Technical Product Marketing Manager, responsible for product messaging and positioning at Pachyderm, a leader in data pipelining and MLOps. He has a diverse background, working with customers in Data Ops, DevOps, CI/CD, relational and NoSQL databases. A recent convert to the potential of AI/ML, Bhavani is passionate about technology and how it can be leveraged to solve customer problems. Throughout his career, Bhavani has promoted these learnings and best practices at numerous industry gatherings. He has a B.S. degree in Operations Research from Indiana University and an MBA from Columbia University.

Which talk track does this best fit into?
Lightning Ignite Talk

Abstract of Talk:
MLOps is not the same as DevOps. Iteration is a common theme to both methodologies but the requirements are different. Your pipelines need to version the code AND the data for easy reproducibility and rollback. Given the enormous size of datasets, data pipelines need to scale to petabytes, automatically trigger and process only the new data, rather than executing a complete run every time. Join us for this lightning talk as we discuss: what are data pipelines and how to leverage pipelines to quickly converge on a ML model.

Talk: Retraining won't Fix your Model (Always)

Presenter:
Oren Razon, Co-Founder & CEO, Superwise

About the Speaker:
Oren is the co-founder and CEO of Superwise, the leading platform for model observability. With over 15 years of experience leading the development, deployment, and scaling of ML products, Oren is an expert ML practitioner specializing in MLOps tools and practices. Previously, Oren managed machine learning activities at Intel’s ML center and operated a machine learning boutique consulting agency helping leading tech companies such as Sisense, Gong, AT&T, and others, to build their machine learning-based products and infrastructure.

Which talk track does this best fit into?
Lightning Ignite Talk

Abstract of Talk:
It’s practically dogma today that a model’s best day in production will be its first day in production. Over time model performance degrades, and there are many variables that can cause decay, from real-world behavior changes to data drifts. When models misbehave, we often turn to retraining to fix the problem, but retraining is not always the best or only solution out there. In this session we’ll take a crash intro in alternative techniques.

Talk: 7 Questions for Data Scientists

Presenter:
Geoffrey Hunter, Lead Data Scientist, SpotHero

About the Speaker:
Geoffrey is passionate about forming end-to-end, product-focused Data Science teams that deliver high impact results. After his post doc, he was a Data Science consultant at different companies and then moved onto leading Data Science teams. He acts to contextualize Data Science opportunities for senior leadership and then mobilizes and mentors the data science teams to focus on understanding and solving problems.

Which talk track does this best fit into?
Lightning Ignite Talk

Abstract of Talk:
Geoffrey will share the 7 Questions I ask on a daily basis to rapidly qualify and frame new Data Science problems. This framework can be applied to understand new opportunities as well as to existing problems to help eliminate noise and focus one’s efforts.

Talk: Recommender Systems at Loblaw Digital

Presenter:
Kai Luo Senior Applied Scientist, Loblaw Digital

About the Speaker:
Kai works as a Senior Applied Scientist at Loblaw Digital, leading development of the core recommender systems used in personalization use cases across all lines of business. Prior to that, he completed a master’s degree at the University of Toronto, with a thesis relating to conversational recommendation.

Which talk track does this best fit into?
Lightning Ignite Talk

Abstract of Talk:
Loblaw Companies Ltd is a retailing conglomerate that is the largest grocery retailer in Canada. Its subsidiaries include over 20 supermarket banners, such as Real Canadian Superstore, No Frills, and T&T. E-commerce, particularly for grocery, has become a significant part of the business, and personalization use cases play an important role in that domain. In this talk, Kai will discuss challenges relating to modeling customers’ behaviors and grocery products’ latent representation, and how we iterate our system to solve these challenges.

Talk: The Transformative Role of AI/ML in Heavy Industries

Presenter:
Rex Lam Director, Machine Learning Platform, Autodesk

About the Speaker:
Rex Lam leads the Machine Learning Platform team to build platform capabilities that enable full ML cycle development and operational tools at Autodesk that aim to enable ML solutions faster, trusted and scalable.

Which talk track does this best fit into?
Lightning Ignite Talk

Abstract of Talk:
AI/ML represents opportunities for Autodesk to drive insights & innovative solutions in architecture, design, and manufacturing tools.

Talk: Digital Twin as A Tool for Industrial Asset Management

Presenter:
Robinson Garcia, R&D Project Manager & Technology Specialist, Petrobras

About the Speaker:
Robinson Garcia graduated in Mechanical Engineering (2006) and did his MBA at the Rotman School of Management (2018). He currently works at the Petrobras Research Center (Cenpes), leading cooperation agreements with universities and startups to develop solutions for industrial asset management.

Which talk track does this best fit into?
Lightning Ignite Talk

Abstract of Talk:
Valued at 500 million CAD, Asset360 is a solution that improves efficiency and reduces the maintenance backlog of offshore installations at Petrobras. The project started back in 2018 after successful experiments with semantic segmentation, and after the signature of two cooperation terms with a partner university. We have built a Streetview-like platform and an information extraction solution over the past two years (+4000 registered users). Currently, we are experimenting with Human in the loop learning, recommendation system, and multi-objective optimization to increase value creation. Our moonshot is to create a two-sided platform that reduces the distance between specialized developer partners (research labs and startups) and internal consumers.

Talk: Monitoring Unstructured Models in Production

Presenter:
Amber Roberts, Machine Learning Engineer, Arize AI

About the Speaker:
Amber Roberts is a community-oriented Machine Learning Engineer at Arize AI, an ML observability company. Amber’s role at Arize looks to help teams across all industries build ML Observability into their productionalized AI environments. Previously, Amber was a product manager of AI at Splunk and the Head of Artificial Intelligence at Insight Data Science. A Carnegie Fellow, Amber has an MS in Astrophysics from the Universidad de Chile. When Amber isn’t expertly teaching ML observability best practices, you can find Amber playing with her two puppies, Rusty and Sully, on Florida’s warm beaches.

Which talk track does this best fit into?
Lightning Ignite Talk

Abstract of Talk:
From images and video to natural language and audio, unstructured data coupled with machine learning can unlock deeper AI potential and ROI for many organizations and use cases. Embeddings are the core of how deep learning models represent structures and are fundamental to how the next generation of ML models work.

Join this talk to:
– Troubleshoot a sentiment classification model in production
– Learn about emerging techniques like UMAP to transform unstructured data into embeddings that can be more efficiently processed by ML models
– Implement new technologies to monitor and improve models in production

Talk: AI & Sustainability: A $50 Trillion Opportunity

Presenter:
Andrea Ruotolo, Global Head, Sustainability / ESG, Rockwell Automation

About the Speaker:
Andrea is the Global Head of Sustainability/ESG at Rockwell Automation, the world’s largest industrial automation company, with responsibility for advancing innovation in sustainability for Rockwell’s customers, which include Fortune 100 companies in energy and manufacturing, representing millions of employees and hundreds of billions of dollars in annual revenues. Andrea is a passionate evangelist for the role AI/ML can play in dramatically improving the sustainability of the industrial sector.

Across her nearly two decades of experience in leading technology innovation, in a career spanning Europe, Asia, and the Americas, Andrea has held multiple senior executive roles focused on applying advanced technologies to solve the sustainability challenge. She has served as co-founder and entrepreneur in smart grid consulting, global lead in the world’s largest engineering services firm in the energy sector, and senior director at a major utility.

As well as her Fulbright Doctorate in an ESG analysis of sustainable energy systems, Andrea holds a B.A. from the University of la Plata in Argentina, a M.Sci. in Aeronautical and Aerospace Engineering from Madrid Polytechnic, and certification in Digital Business Strategy and AI from MIT Sloan School of Management.

Which talk track does this best fit into?
Lightning Ignite Talk

Abstract of Talk:
The financial and business community have already caught on to the essential importance of sustainability. Investors now call for better practices and reporting on Environment, Social, and Governance, or ESG performance metrics. According to Bloomberg Intelligence, growth in ESG investing is fast becoming the new norm, with ESG investments projected to exceed USD 50 trillion by 2025 – more than 1/3 of all global assets under management. This movement of funds to ESG represents a massive, once-in-a-generation transition to an entirely new economy.

Sustainability is incredibly complex, involving billions of moving parts and decisions. It starts at the edge, where exabytes of data are flowing from real-time sensors and controls in factories and power plants, which aggregate up to the top-level decision makers in companies, which aggregate up to the massive funds that hold portfolios of those companies, and to government regulators and policymakers. AI is critical in analyzing those exabytes of data and enables closed-loop optimization to reduce energy, water, and waste.

In this session, we’ll explore the top 3 needs and opportunities for AI to catalyze change toward more sustainable companies, economies, and societies.

Subscribe for Email Updates
Marketing by

Join our Free Start-up Showcase and Career Fair

Who Attends

0 +
Attendees
0 %
Data Practitioners
0 %
Researchers/Academics
0 %
Business Leaders

TMLS 2021 Event Demographics

0 +
Delegate Attendees per Conference
0 %
Highly Qualified Practitioners*
0 %
Currently Working in Industry*
0 %
Attendees Looking for Solutions
0 %
Currently Hiring
0 %
Attendees Actively Job-Searching

TMLS 2021 Technical Background

Expert
12.2%
Advanced
41.3%
Intermediate
37.4%
Beginner
9.1%

TMLS 2021 Attendees & Thought Leadership

0
Attendees
0 +
Speakers
0 +
Company Sponsors

Business Leaders: C-Level Executives, Project Managers, and Product Owners will get to explore best practices, methodologies, principles, and practices for achieving ROI.

Engineers, Researchers, Data Practitioners: Will get a better understanding of the challenges, solutions, and ideas being offered via breakouts & workshops on Natural Language Processing, Neural Nets, Reinforcement Learning, Generative Adversarial Networks (GANs), Evolution Strategies, AutoML, and more.

Job Seekers: Will have the opportunity to network virtually and meet over 60 Top Al Start-ups and companies during the EXPO & Career Fair.

Ignite what is an Ignite Talk?

Ignite is an innovative and fast-paced style used to deliver a concise presentation.

During an Ignite Talk, presenters discuss their research using 20 image-centric slides which automatically advance every 15 seconds.

The result is a fun and engaging five-minute presentation.

You can see all our speakers and full agenda here

Get our official conference app
For Blackberry or Windows Phone, Click here
For feature details, visit Whova