Click Speaker to Read Abstract
Click Speaker to Read Abstract
Click Speaker to Read Abstract
Click Speaker to Read Abstract
Click Speaker to Read Abstract
Click Speaker to Read Abstract
Loading...
Professor, Cheriton School of Computer Science, University of Waterloo
Director, Head of Apple Knowledge Platform, Apple
Ihab Ilyas is a professor in the Cheriton School of Computer Science and the NSERC-Thomson Reuters Research Chair on data quality at the University of Waterloo. He is currently on leave at Apple to lead the Apple Knowledge Platform team. His main research focuses on the areas of Data Science and data management, with special interest in data quality and integration, managing uncertain data, machine learning for data curation, and information extraction. Ihab is a co-founder of Tamr, a startup focusing on large-scale data integration, and the co-founder of inductiv (acquired by Apple), a Waterloo-based startup on using AI for structured data cleaning. He is an ACM Fellow and IEEE Fellow, a recipient of the Ontario Early Researcher Award, a Cheriton Faculty Fellowship, an NSERC Discovery Accelerator Award, and a Google Faculty Award. Ihab was an elected member of the VLDB Endowment board of trustees (2016-2021), elected SIGMOD vice chair (2016-2021), an associate editor of the ACM Transactions of Database Systems (2014-2020), and an associate editor of Foundations of Database Systems. He holds a PhD in computer science from Purdue University, West Lafayette.
Talk: Saga: Continuous Construction and Serving of Large Scale Knowledge Graphs
Abstract: In this talk I present Saga, an end-to-end platform for incremental and continuous construction of large scale knowledge graphs we built at Apple. Saga demonstrates the complexity of building such platform in industrial settings with strong consistency, latency, and coverage requirements. In the talk, I will discuss challenges around the following: building source adapters for ingesting heterogenous data sources; building entity linking and fusion pipelines for constructing coherent knowledge graphs that adhere to a common controlled vocabulary; updating the knowledge graphs with real-time streams; and finally, exposing the constructed knowledge via a variety of services. Graph services include: low-latency query answering; graph analytics; ML-biased entity disambiguation and semantic annotation; and other graph-embedding services to power multiple downstream applications. Saga is used in production at large scale to power a variety of user-facing knowledge features.
What You’ll Learn: Complexity of building large scale knowledge graphs
Track: Technical
Technical Level: 5
Location: Seattle
Presenter:
Ihab Ilyas, Professor in the Cheriton School of Computer Science and the NSERC-Thomson Reuters Research Chair on Data quality at the University of Waterloo
About the Speaker:
Ihab Ilyas is a professor in the Cheriton School of Computer Science and the NSERC-Thomson Reuters Research Chair on data quality at the University of Waterloo. He is currently on leave at Apple to lead the Apple Knowledge Platform team. His main research focuses on the areas of Data Science and data management, with special interest in data quality and integration, managing uncertain data, machine learning for data curation, and information extraction. Ihab is a co-founder of Tamr, a startup focusing on large-scale data integration, and the co-founder of inductiv (acquired by Apple), a Waterloo-based startup on using AI for structured data cleaning. He is an ACM Fellow and IEEE Fellow, a recipient of the Ontario Early Researcher Award, a Cheriton Faculty Fellowship, an NSERC Discovery Accelerator Award, and a Google Faculty Award. Ihab was an elected member of the VLDB Endowment board of trustees (2016-2021), elected SIGMOD vice chair (2016-2021), an associate editor of the ACM Transactions of Database Systems (2014-2020), and an associate editor of Foundations of Database Systems. He holds a PhD in computer science from Purdue University, West Lafayette.
Which talk track does this best fit into?
Technical / Research
Technical level of your talk?
(Technical level: 5/7)
Are there any industries (in particular) that are relevant for this talk?
Information Technology & Service
What are the main core message (learning) you want attendees to take away from this talk?
Complexity of building large scale knowledge graphs.
Abstract of Talk:
In this talk I present Saga, an end-to-end platform for incremental and continuous construction of large scale knowledge graphs we built at Apple. Saga demonstrates the complexity of building such platform in industrial settings with strong consistency, latency, and coverage requirements. In the talk, I will discuss challenges around the following: building source adapters for ingesting heterogenous data sources; building entity linking and fusion pipelines for constructing coherent knowledge graphs that adhere to a common controlled vocabulary; updating the knowledge graphs with real-time streams; and finally, exposing the constructed knowledge via a variety of services. Graph services include: low-latency query answering; graph analytics; ML-biased entity disambiguation and semantic annotation; and other graph-embedding services to power multiple downstream applications. Saga is used in production at large scale to power a variety of user-facing knowledge features.
CEO, Claypot AI
Chip Huyen is a co-founder of Claypot AI, a platform for real-time machine learning. Previously, she was with Snorkel AI and NVIDIA. She teaches CS 329S: Machine Learning Systems Design at Stanford. She’s the author of Designing Machine Learning Systems, an Amazon bestseller in AI. She has also written four bestselling Vietnamese books.
Talk: Real-time Machine Learning: Architecture and Challenges
Abstract: Fresh data beats stale data for machine learning applications. This talk discusses the value of fresh data as well as different types of architecture and challenges of online prediction.
What You’ll Learn: Fresh data beats stale data for machine learning applications
Track: Technical
Technical Level: 5
Location: San Franciso
Presenter:
Chip Huyen, CEO, Claypot AI
About the Speaker:
Chip Huyen is a co-founder of Claypot AI, a platform for real-time machine learning. Previously, she was with Snorkel AI and NVIDIA. She teaches CS 329S: Machine Learning Systems Design at Stanford. She’s the author of Designing Machine Learning Systems, an Amazon bestseller in AI. She has also written four bestselling Vietnamese books.
Which talk track does this best fit into?
Technical / Research
Technical level of your talk?
(Technical level: 5/7)
Are there any industries (in particular) that are relevant for this talk?
Banking & Financial Services, Computer Software, Information Technology & Service, Insurance, Marketing & Advertising
What are the main core message (learning) you want attendees to take away from this talk?
Fresh data beats stale data for machine learning applications
Abstract of Talk:
Fresh data beats stale data for machine learning applications. This talk discusses the value of fresh data as well as different types of architecture and challenges of online prediction.
Professor, University of Toronto
Anne Martel is a Professor in Medical Biophysics at the University of Toronto, the Tory Family Chair in Oncology at Sunnybrook Research Institute, and a Faculty Affiliate at the Vector Institute, Toronto. Her research program is focused on medical image and digital pathology analysis, particularly on the development of self-supervised and weakly supervised methods for segmentation, diagnosis, and prediction/prognosis. In 2006 she co-founded Pathcore, a software company developing complete workflow solutions for digital pathology.
Dr Martel is an active member of the medical image analysis community and is a fellow of the MICCAI Society which represents engineers and computer scientists working in this field. She has served as board member of MICCAI and is currently on the editorial board of Medical Image Analysis, on of the leading journals in the field.
Talk: Artificial Intelligence And Digital Pathology: Making The Most of Limited Annotated Data
Abstract: Obtaining large datasets with detailed annotations for medical imaging AI projects is a time consuming and expensive process as it usually requires the input of expert radiologists and pathologists. Collecting data to train outcome prediction models is even more challenging as the number of patients with both imaging and follow up data may be small, and only weak labels are available.
This talk will describe several semi-supervised and self-supervised approaches which can make more efficient use of small and/or weakly labelled datasets. The focus will be on digital pathology but the methods described are applicable any medical imaging modality.
What You’ll Learn: Self-supervision and smart sampling strategies are essential in digital pathology
Track: Advanced Technical/Research
Technical Level: 6
Location: Toronto
Presenter:
Anne Martel, Professor, University of Toronto
About the Speaker:
Anne Martel is a Professor in Medical Biophysics at the University of Toronto, the Tory Family Chair in Oncology at Sunnybrook Research Institute, and a Faculty Affiliate at the Vector Institute, Toronto. Her research program is focused on medical image and digital pathology analysis, particularly on the development of self-supervised and weakly supervised methods for segmentation, diagnosis, and prediction/prognosis. In 2006 she co-founded Pathcore, a software company developing complete workflow solutions for digital pathology.
Dr Martel is an active member of the medical image analysis community and is a fellow of the MICCAI Society which represents engineers and computer scientists working in this field. She has served as board member of MICCAI and is currently on the editorial board of Medical Image Analysis, on of the leading journals in the field.
Which talk track does this best fit into?
Advanced Technical / Research
Technical level of your talk?
(Technical Level: 6/7)
What you’ll learn:
Self-supervision and smart sampling strategies are essential in digital pathology
Abstract of Talk:
Obtaining large datasets with detailed annotations for medical imaging AI projects is a time consuming and expensive process as it usually requires the input of expert radiologists and pathologists. Collecting data to train outcome prediction models is even more challenging as the number of patients with both imaging and follow up data may be small, and only weak labels are available.
This talk will describe several semi-supervised and self-supervised approaches which can make more efficient use of small and/or weakly labelled datasets. The focus will be on digital pathology but the methods described are applicable any medical imaging modality.
Senior Research Scientist, Sony AI
Varun Kompella is currently a senior research scientist at Sony AI. He earned his master’s of science degree in informatics with a specialization in graphics, vision and robotics from Institut Nationale Polytechnique de Grenoble (INRIA Grenoble), and a Ph.D degree from Università della Svizzera Italiana (IDSIA Lugano), Switzerland, working with Prof. Juergen Schmidhuber. In his thesis work he developed algorithms that use the slowness principle for driving exploration in reinforcement learning agents. After completing his Ph.D., he worked as a postdoctoral researcher at the Institute for Neural Computation (INI), Germany. His research contributions led to several patents, publications in peer-reviewed journals and conference proceedings.
Talk: Outracing Champion Gran Turismo Drivers With Deep Reinforcement Learning
Abstract: Many potential applications of artificial intelligence involve making real-time decisions in physical systems while interacting with humans. Automobile racing represents an extreme example of these conditions; drivers must execute complex tactical manoeuvres to pass or block opponents while operating their vehicles at their traction limits. Racing simulations, such as the PlayStation game Gran Turismo, faithfully reproduce the non-linear control challenges of real race cars while also encapsulating the complex multi-agent interactions. Here we describe how we trained agents for Gran Turismo that can compete with the world’s best e-sports drivers. We combine state-of-the-art, model-free, deep reinforcement learning algorithms with mixed-scenario training to learn an integrated control policy that combines exceptional speed with impressive tactics.
In addition, we construct a reward function that enables the agent to be competitive while adhering to racing’s important, but under-specified, sportsmanship rules. We demonstrate the capabilities of our agent, Gran Turismo Sophy, by winning a head-to-head competition against four of the world’s best Gran Turismo drivers. By describing how we trained championship-level racers, we demonstrate the possibilities and challenges of using these techniques to control complex dynamical systems in domains where agents must respect imprecisely defined human norms.
What You’ll Learn: We demonstrate the possibilities and challenges of using deep RL techniques to control complex dynamical systems in domains such as Gran Turismo where agents must respect imprecisely defined human norms.
Track: Technical / Research
Technical Level: 7
Location: Ottawa
Presenter:
Varun Raj Kompella, Senior Research Scientist, Sony AI
About the Speaker:
Varun Kompella is currently a senior research scientist at Sony AI. He earned his master’s of science degree in informatics with a specialization in graphics, vision and robotics from Institut Nationale Polytechnique de Grenoble (INRIA Grenoble), and a Ph.D degree from Università della Svizzera Italiana (IDSIA Lugano), Switzerland, working with Prof. Juergen Schmidhuber. In his thesis work he developed algorithms that use the slowness principle for driving exploration in reinforcement learning agents. After completing his Ph.D., he worked as a postdoctoral researcher at the Institute for Neural Computation (INI), Germany. His research contributions led to several patents, publications in peer-reviewed journals and conference proceedings.
Which talk track does this best fit into?
Technical / Research
Technical level of your talk?
(Technical Level: 7/7)
What you’ll learn:
We demonstrate the possibilities and challenges of using deep RL techniques to control complex dynamical systems in domains such as Gran Turismo where agents must respect imprecisely defined human norms.
Abstract of Talk:
Many potential applications of artificial intelligence involve making real-time decisions in physical systems while interacting with humans. Automobile racing represents an extreme example of these conditions; drivers must execute complex tactical manoeuvres to pass or block opponents while operating their vehicles at their traction limits. Racing simulations, such as the PlayStation game Gran Turismo, faithfully reproduce the non-linear control challenges of real race cars while also encapsulating the complex multi-agent interactions. Here we describe how we trained agents for Gran Turismo that can compete with the world’s best e-sports drivers. We combine state-of-the-art, model-free, deep reinforcement learning algorithms with mixed-scenario training to learn an integrated control policy that combines exceptional speed with impressive tactics.
In addition, we construct a reward function that enables the agent to be competitive while adhering to racing’s important, but under-specified, sportsmanship rules. We demonstrate the capabilities of our agent, Gran Turismo Sophy, by winning a head-to-head competition against four of the world’s best Gran Turismo drivers. By describing how we trained championship-level racers, we demonstrate the possibilities and challenges of using these techniques to control complex dynamical systems in domains where agents must respect imprecisely defined human norms.
Software Engineer, Google Brain
Bo Chang is a software engineer at Google Brain, based in Toronto, Canada. Prior to that, he was a machine learning researcher at Borealis AI. He finished his Ph.D. in statistics at the University of British Columbia.
Talk: Latent User Intent Modeling in Recommender Systems
Abstract: The current sequential recommender systems mainly rely on users’ item-level interaction history to capture topical interests and lacks a high-level understanding of user intent. It is challenging to explicitly define and enumerate all possible user intents. We propose to use latent variable models to capture user intents as latent variables through encoding and decoding user behavior signals, with an application to a large industrial recommender system.
What You’ll Learn: How to better model user intent in recommender systems using a latent variable model.
Track: Advanced Technical/ Research
Technical Level: 7
Location: Toronto
Presenter:
Bo Chang, Software Engineer, Google Brain
About the Speaker:
Bo Chang is a software engineer at Google Brain, based in Toronto, Canada. Prior to that, he was a machine learning researcher at Borealis AI. He finished his Ph.D. in statistics at the University of British Columbia.
Which talk track does this best fit into?
Advanced Technica l/ Research
Technical level of your talk?
(Technical Level: 7/7)
What you’ll learn:
How to better model user intent in recommender systems using a latent variable model.
Abstract of Talk:
The current sequential recommender systems mainly rely on users’ item-level interaction history to capture topical interests and lacks a high-level understanding of user intent. It is challenging to explicitly define and enumerate all possible user intents. We propose to use latent variable models to capture user intents as latent variables through encoding and decoding user behavior signals, with an application to a large industrial recommender system.
Machine Learning Engineer, DagsHub
I’m a second-year undergraduate studying Data Science at Purdue University. I also work part-time as a Machine Learning Engineer @ DagsHub.
I love research – especially within academia! My interests lie firmly within Machine Vision, NLP & Cybersecurity; so far, I’ve published some peer-reviewed papers and have pending patents within these domains.
Talk: Interpretability Tools are Feedback Loops
Abstract: Fundamentally – Machine Learning as a field is designed to emulate the way humans think; hence, *neural* networks. When we train our models, we use optimizers and loss functions to measure their success. While these functions make sense mathematically, they are far from intuitive or explaining what happened behind the scenes. It’s hard to pick the correct functions, and performing huge grid searches to hyperparameter tune at scale is as logical as bruteforcing an SHA-256 hash.
On the other hand – Interpretability techniques can’t really be used in a training context but are intuitive in helping us understand how a given model interprets a set of data.
This talk aims to bridge the gap between the two, connecting them within a single training loop to maximize training effectiveness without disproportionately increasing compute or training time. Making training intuitive to how humans learn should help develop models that actually work, without resorting to “useless” training.
I aim to showcase – with a practical demonstration – learning techniques to build feedback loops wherein interpretability is used to better optimize a training sequence. I also aim to discuss how this carries forward to complex architectures, and a potential approach for their relevant implementation.
Structurally: the talk would provide an overview on machine interpretability, provide a brief overview on optimizers and loss functions before jumping into the implementation walkthrough of a case study. The case study uses TensorFlow, but can generally be applied to any desired framework.
What You’ll Learn: If, by the end my presentation, attendees are able to identify techniques for applying the proposed approach within their internal systems, or find themselves motivated to further research the ideas presented, I’d consider the talk a success.
Track: Technical
Technical Level: 6
Location: West Lafayette, Indiana, United States
Presenter:
Jinen Setpal, Machine Learning Engineer, DagsHub
About the Speaker:
I’m a second-year undergraduate studying Data Science at Purdue University. I also work part-time as a Machine Learning Engineer @ DagsHub.
I love research – especially within academia! My interests lie firmly within Machine Vision, NLP & Cybersecurity; so far, I’ve published some peer-reviewed papers and have pending patents within these domains.
Which talk track does this best fit into?
Technical / Research
Technical level of your talk?
(Technical level: 6/7)
Are there any industries (in particular) that are relevant for this talk?
Computer Software, Researchers within Academia
Who is this presentation for?
Data Scientists/ ML Engineers, ML Engineers, Researchers
What you’ll learn:
Most guides on Interpretable Machine Learning, while focusing on the seemingly black-box nature of models, fail to address the challenge of it only working post-training. It’s addressed as a final check tool to ensure it isn’t learning tragically differently from what the researcher expects, but not directly as part of the training feedback loop. I aim to address this.
What are the main core message (learning) you want attendees to take away from this talk?
If, by the end my presentation, attendees are able to identify techniques for applying the proposed approach within their internal systems, or find themselves motivated to further research the ideas presented, I’d consider the talk a success.
Pre-requisite Knowledge:
It’s a technical presentation, and requires participants to be familiar on a high level with model optimization techniques and machine interpretability. They should also be very familiar with the standard classification pipeline of a feedforward neural network.
What is unique about this speech, from other speeches given on the topic?
I work extensively on ML Reproducibility. In fact, it is the fundamental work done within my research grant funded by Google. Time and again, I find that papers tend to document the end-result and the problem statement extensively, but not everything in between. That’s where the real learning takes place, understanding what DIDN’T work and WHY.
This is rarely ever documented. Papers list working parameters, both functions and hyperparameters, but fail to explain why. I’ve struggled a lot with this and hope to relay potential solutions through the extent of my presentation.
Abstract of Talk:
Fundamentally – Machine Learning as a field is designed to emulate the way humans think; hence, *neural* networks. When we train our models, we use optimizers and loss functions to measure their success. While these functions make sense mathematically, they are far from intuitive or explaining what happened behind the scenes. It’s hard to pick the correct functions, and performing huge grid searches to hyperparameter tune at scale is as logical as bruteforcing an SHA-256 hash.
On the other hand – Interpretability techniques can’t really be used in a training context but are intuitive in helping us understand how a given model interprets a set of data.
This talk aims to bridge the gap between the two, connecting them within a single training loop to maximize training effectiveness without disproportionately increasing compute or training time. Making training intuitive to how humans learn should help develop models that actually work, without resorting to “useless” training.
I aim to showcase – with a practical demonstration – learning techniques to build feedback loops wherein interpretability is used to better optimize a training sequence. I also aim to discuss how this carries forward to complex architectures, and a potential approach for their relevant implementation.
Structurally: the talk would provide an overview on machine interpretability, provide a brief overview on optimizers and loss functions before jumping into the implementation walkthrough of a case study. The case study uses TensorFlow, but can generally be applied to any desired framework.
Can you suggest 2-3 topics for post-discussion?
Information Retreival, Explainable AI, relevance of the above in product development
Senior Machine Learning Scientist, Layer 6 AI
Jesse is a Senior Machine Learning Scientist at Layer 6 AI within TD, and is the Team Lead for Credit Risk. His applied work centers on building machine learning models in high risk and highly regulated domains. Jesse leads research on privacy enhancing technologies for machine learning including topics of Federated Learning and Differential Privacy.
Talk: Navigating the Tradeoff Between Privacy and Fairness in ML
Abstract: As machine learning becomes more widespread throughout society, aspects including data privacy and fairness must be carefully considered, and are crucial for deployment in highly regulated industries. Unfortunately, the application of privacy enhancing technologies often worsens unfair tendencies in models. In this talk we address the intersection of privacy and fairness in machine learning, and offer research-based solutions for navigating the tradeoffs.
What You’ll Learn: Applying privacy enhancing technologies can increase bias and unfairness in ML models. Practitioners need to consider the intersection of these important ethical ideas.
Track: Technical
Technical Level: 5
Location: East York
Presenter:
Jesse Cresswell, Senior Machine Learning Scientist, Layer 6 AI
About the Speaker:
Jesse is a Senior Machine Learning Scientist at Layer 6 AI within TD, and is the Team Lead for Credit Risk. His applied work centers on building machine learning models in high risk and highly regulated domains. Jesse leads research on privacy enhancing technologies for machine learning including topics of Federated Learning and Differential Privacy.
Which talk track does this best fit into?
Technical
Technical level of your talk?
(Technical Level: 5/7)
What you’ll learn:
Applying privacy enhancing technologies can increase bias and unfairness in ML models. Practitioners need to consider the intersection of these important ethical ideas.
Abstract of Talk:
As machine learning becomes more widespread throughout society, aspects including data privacy and fairness must be carefully considered, and are crucial for deployment in highly regulated industries. Unfortunately, the application of privacy enhancing technologies often worsens unfair tendencies in models. In this talk we address the intersection of privacy and fairness in machine learning, and offer research-based solutions for navigating the tradeoffs.
Senior Data Scientist, Anaconda
Sophia Yang is a Senior Data Scientist at Anaconda, Inc., where she uses data science to facilitate decision-making for various departments across the company. She volunteers as a Project Incubator at NumFOCUS to help Open Source Scientific projects grow. She is also the author of multiple Python open-source libraries such as condastats, cranlogs, PyPowerUp, intake-stripe, and intake-salesforce. She holds an M.S. in Statistics and Ph.D. in Educational Psychology from The University of Texas at Austin.
Talk: PyScript for Data Science
Abstract: Are you a data scientist or a developer who mostly uses Python? Are you jealous of developers who write Javascript code and build fancy websites in a browser? How nice would it be if we can write websites in Python? PyScript makes it possible! PyScript allows users to write Python in the browser. In this talk, I will introduce PyScript and discuss what does PyScript mean for data scientists, how PyScript might change the way data scientists work, and how PyScript can be incorporated into the data science workflow.
What You’ll Learn: Use PyScript to run Python in Your HTML
Track: Technical
Technical Level: 4
Location: Austin
Presenter:
Sophia Yang, Senior Data Scientist, Anaconda
About the Speaker:
Sophia Yang is a Senior Data Scientist at Anaconda, Inc., where she uses data science to facilitate decision-making for various departments across the company. She volunteers as a Project Incubator at NumFOCUS to help Open Source Scientific projects grow. She is also the author of multiple Python open-source libraries such as condastats, cranlogs, PyPowerUp, intake-stripe, and intake-salesforce. She holds an M.S. in Statistics and Ph.D. in Educational Psychology from The University of Texas at Austin.
Which talk track does this best fit into?
Technical / Research
Technical level of your talk?
(Technical level: 3/7)
Are there any industries (in particular) that are relevant for this talk?
Computer Software, Information Technology & Service
Who is this presentation for?
Senior Business Executives, Product Managers, Data Scientists/ ML Engineers and High-level Researchers, Product Managers, Data Scientists/ ML Engineers
What you’ll learn:
PyScript was announced early this year. There is not much tutorial and content online yet.
Pre-requisite Knowledge:
Knowledge of Python is recommended
What is unique about this speech, from other speeches given on the topic?
I will introduce PyScript from a data science perspective.
Abstract of Talk:
Are you a data scientist or a developer who mostly uses Python? Are you jealous of developers who write Javascript code and build fancy websites in a browser? How nice would it be if we can write websites in Python? PyScript makes it possible! PyScript allows users to write Python in the browser. In this talk, I will introduce PyScript and discuss what does PyScript mean for data scientists, how PyScript might change the way data scientists work, and how PyScript can be incorporated into the data science workflow.
Can you suggest 2-3 topics for post-discussion?
Visualization, Model Deployment
Presenter:
Steven Waslander Professor, Institute for Aerospace Studies / Director, Toronto Robotics and AI Laboratory, University of Toronto
About the Speaker:
Prof. Steven Waslander is a leading authority on autonomous aerial and ground vehicles, including multirotor drones and autonomous driving vehicles. He received his B.Sc.E.in 1998 from Queen’s University, his M.S. in 2002 and his Ph.D. in 2007, both from Stanford University in Aeronautics and Astronautics. He founded and directed the Waterloo Autonomous Vehicle Laboratory (WAVELab) in 2008-2018, and the Toronto Robotics and Artificial Intelligence Laboratory (TRAILab) at the University of Toronto from 2018 onward.
Prof. Waslander’s work on autonomous vehicles has resulted in the Autonomoose, the first autonomous vehicle created at a Canadian University to drive on public roads. His insights into autonomous driving have been featured in the Globe and Mail, Toronto Star, National Post, and the Rick Mercer Report. He has over 160 publications and host the Self-Driving Car Specialization on Coursera, which has accumulated over 150,000 learners worldwide since 2019.
Which talk track does this best fit into?
Technical
Technical level of your talk?
(Technical level: 6/7)
What you’ll learn:
That winter is harder than clear weather, but that we can still build safe self-driving cars for any weather condition, if we take the time to work through the added challenges.
Abstract of Talk:
Autonomous driving solutions are steadily progressing toward real-world deployments, but most companies are focused on driving in clear weather days in benign climates. Our work on exposing the challenges of Canadian winters for perception tasks has led to the Canadian Adverse Driving Conditions Dataset, and to multiple advances in all-weather autonomy that set the stage for more complete dominion of robotics systems over sensor degradation due to precipitation and accumulation. In this talk, I’ll highlight some of the worst problems that arise for autonomous systems in Canada, and lay out our plans for WinTOR, a new University of Toronto research program aimed at helping self-driving vehicles extend their range to our roadways year round.
Presenter:
Nestor Maslej, Research Manager, AI Index, Stanford Institute for Human-Centered AI
About the Speaker:
Nestor Maslej is a Research Manager at Stanford’s Institute for Human-Centered Artificial Intelligence (HAI). In this position, he manages the AI Index and Global AI Vibrancy Tool. Nestor also leads research projects that study AI in the context of technical advancement, ethical concerns and policymaking. In developing tools that track the advancement of AI, Nestor hopes to make the AI space more accessible to policymakers.
Nestor also speaks frequently about trends in AI. He has delivered presentations about the AI Index to teams at the World Economic Forum, Centre for Data Ethics and Innovation and Global Arena Research Institute. Nestor has also testified to the Canadian Parliament’s House of Commons Standing Committee on Access to Information, Privacy and Ethics on the use and impact of facial recognition technology in Canada.
Prior to joining HAI, Nestor worked in Toronto as an analyst in several startups. He graduated from the University of Oxford in 2021 with an MPhil in Comparative Government, where he used machine learning methodologies to study the Canadian Indian Residential schooling system and Harvard College in 2017 with an A.B. in Social Studies.
Which talk track does this best fit into?
Technical
Technical level of your talk?
(Technical level: 3/7)
What you’ll learn:
That AI is here in a way that it was not before, and that as a society, we need to think critically about the role AI should play in our lives.
Abstract of Talk:
Learn about some of the main trends in AI, as told to you by the 2022 AI Index Report. The AI Index is one of the most widely read annual reports on trends in AI and has informed AI policymakers and industry leaders across the globe. This presentation covers some of the main trends explored in the report, namely trends in areas such as research and development, technical advancement, ethics, economics, policy and education.
Presenter:
Jordan Shaw, Creative Technology Lead, Half Helix
About the Speaker:
Jordan Shaw is an artist, and creative technologist raised and is currently based in Toronto, Canada. He grew up in Scarborough and received his MFA from OCAD University’s Digital Futures program leading to his thesis being exhibited during Vector Festival at InterAccess. Before that, he completed his undergraduate degrees at Carleton University and Algonquin College, where his final installation was exhibited and recognized during ACM SIGGRAPH.
His work is related to exposing the hidden and unseen aspects of technology and the digital environment around us. The manifestation of this work tries to visualize the hidden interactions between people and technology, data collection and these digital systems trying to understand the physical world.
Jordan has exhibited internationally in Australia, Canada, Germany, Spain and the United States of America.
Which talk track does this best fit into?
Technical
Technical level of your talk?
(Technical level: 4/7)
What you’ll learn:
The exploration, evolution and progression of ML throughout the last couple of years through a creative lens.
Abstract of Talk:
A creative evolution of ML in the arts. The speaker started producing artwork and exhibiting ML pieces in 2015. The reflective perspective of progression in ML and AI while observing creative model biases provides a unique perspective for the future of ML in creative fields and may influence popular culture down the road.
Presenters:
Ali Sabet, ML Engineer, Cohere & Royal Sequeira, AI Research Scientist, LG Electronics Toronto AI Lab
About the Speakers:
Ali is a Machine Learning Engineer at Cohere, working on both text and image generation. He’s built viral apps, made fundamental contributions to instruct training rolled out in Cohere’s text products, and is leading image generation capabilities at Cohere.
Royal Sequeira is an AI Research Scientist at LG Toronto AI Lab. He did his Masters in Computer Science from University of Waterloo. In the past, he has worked at Microsoft Research India, Ada Support Inc. in Toronto. In 2018, he founded Sushiksha, a mentorship organization that helps hundreds of students across India.
Which talk track does this best fit into?
Technical
Presenter:
Piero Molino, CEO & Co-Founder, Predibase
Which talk track does this best fit into?
Technical
Technical level of your talk?
(Technical level: 5/7)
Abstract of Talk:
Declarative Machine Learning Systems are a new trend that marries the flexibility of DIY machine learning infrastructure and the simplicity of AutoML solutions. In this talk we will discuss about Ludwig, the open source declarative deep learning framework, and Predibase, an enterprise grade solution based on it.
Presenter:
Liran Hason, CEO, Aporia
About the Speaker:
Liran Hason is the Co-Founder and CEO of Aporia, a full-stack ML observability platform used by Fortune 500 companies and data science teams across the world to ensure responsible AI. Prior to founding Aporia, Liran was an ML Architect at Adallom (acquired by Microsoft), and later an investor at Vertex Ventures. Liran created Aporia after seeing first-hand the effects of AI without guardrails. In 2022, Forbes named Aporia as the “Next Billion-Dollar Companies”.
Which talk track does this best fit into?
Business Strategy
Technical level of your talk?
(Technical level: 4/7)
What you’ll learn:
We’ll get down to the core reasons, and discuss a framework for building successful ML products, achieving data science-business alignment, and accomplishing trust and model value recognition.
Abstract of Talk:
F1 score, OKRs, Precision, KPIs – are more related than you’d think.
As an ML Engineer and business leader Liran will talk about the frustration that both Data Scientists and business stakeholders experience with ML projects.
Vice President and General Manager of Global Affairs, Economics and Public Policy, RIWI (Real-Time Interactive Worldwide Intelligence)
Opening speaker, Tedx Toronto: The Smartest Way to Predict the Future; Developing a U of Toronto course on new data tools for global affairs / public policy; see also linkedin
Talk: The Risks of Excluding the Disengaged From your Dataset
Abstract: Analysts use machine learning tools to analyze the vast amounts of data now available. Big data is appealing, but even the best tools and techniques will give you noise if the underlying data aren’t robust and inclusive. To reliably predict elections, correctly anticipate consumer demand or to accurately predict the trajectory of a pandemic, leaders need to ask themselves who is (and isn’t) included in their dataset.
What You’ll Learn: There is a huge emphasis on tools to analyze big data but we need to spend at least as much time on the quality of the underlying dataset. If, for example, we exclude disengaged populations, as some common methods do, we risk getting things wrong, whether understanding a market’s potential, predicting an election result, or anticipating the trajectory of the pandemic, the war, or the economy.
Track: Business
Technical Level: 1
Location: Toronto
Presenter:
Danielle Goldfarb, Vice President and General Manager of Global Affairs, Economics and Public Policy, RIWI (Real-Time Interactive Worldwide Intelligence)
About the Speaker:
Opening speaker, Tedx Toronto: The Smartest Way to Predict the Future; Developing a U of Toronto course on new data tools for global affairs / public policy; see also linkedin
Which talk track does this best fit into?
Business Strategy
Technical level of your talk?
(Technical level: 1/7)
What you’ll learn:
There is a huge emphasis on tools to analyze big data but we need to spend at least as much time on the quality of the underlying dataset. If, for example, we exclude disengaged populations, as some common methods do, we risk getting things wrong, whether understanding a market’s potential, predicting an election result, or anticipating the trajectory of the pandemic, the war, or the economy.
Abstract of Talk:
Analysts use machine learning tools to analyze the vast amounts of data now available. Big data is appealing, but even the best tools and techniques will give you noise if the underlying data aren’t robust and inclusive. To reliably predict elections, correctly anticipate consumer demand or to accurately predict the trajectory of a pandemic, leaders need to ask themselves who is (and isn’t) included in their dataset.
Vice President and General Manager of Global Affairs, Economics and Public Policy, RIWI (Real-Time Interactive Worldwide Intelligence)
Opening speaker, Tedx Toronto: The Smartest Way to Predict the Future; Developing a U of Toronto course on new data tools for global affairs / public policy; see also linkedin
Talk: The Risks of Excluding the Disengaged From your Dataset
Abstract: Analysts use machine learning tools to analyze the vast amounts of data now available. Big data is appealing, but even the best tools and techniques will give you noise if the underlying data aren’t robust and inclusive. To reliably predict elections, correctly anticipate consumer demand or to accurately predict the trajectory of a pandemic, leaders need to ask themselves who is (and isn’t) included in their dataset.
What You’ll Learn: There is a huge emphasis on tools to analyze big data but we need to spend at least as much time on the quality of the underlying dataset. If, for example, we exclude disengaged populations, as some common methods do, we risk getting things wrong, whether understanding a market’s potential, predicting an election result, or anticipating the trajectory of the pandemic, the war, or the economy.
Track: Business
Technical Level: 1
Location: Toronto
Presenters:
David Van Bruwaene, Founder and CEO, Fairly AI & Fion Lee-Madan, Co-Founder and COO, Fairly AI & Susie Lindsay, Counsel, Law Commission of Ontario
About the Speakers:
Fion has over 20+ years of experience in enterprise software as a Solutions Architect (ex-Sapient, ex-Intuit, ex-ATG – acquired by Oracle.) She double majored in Computer Science and Human Biology at the University of Toronto and has an MBA from Boston University. She is a technical committee member of the CIO Strategy Council of Canada. She is a champion of DE&I, and a major supporter for women in tech as both a mentor and coach. She guest lectures for AI Ethics at Lighthouse Labs, an education company with a goal to bring more diversity into the Data Science field.
David has developed a deep understanding of ethics and formal logic, model theory, and Natural Language Processing (NLP) throughout his career in business and academia. He taught Business Ethics at the University of Waterloo and graduate level Ethics at Cornell and is a sought-after speaker for AI Ethics, Compliance and Risk Management at conferences around the world. On an exchange scholarship from Cornell to Berkeley, he became fascinated with powerful Natural Language Processing. David applied this background to cyberbullying and related NLP in his first AI startup ViSR (acquired by SafetoNet) where he was the Head of Data Science, then became the CEO and Board Member. His unique perspective as a practicing Data Scientist to being the CEO and Board Member at the top made David conscious of tensions between technical vs business decisions and the impact on people’s lives in the resulting automation of decision-making.
Susie is Counsel at the Law Commission of Ontario where she leads numerous LCO projects including: AI and the Civil Justice System, Protection Orders, and the LCO’s joint initiative on AI and Human Rights with the Ontario Human Rights Commission and Canadian Human Rights Commission. Before joining the Commission Susie practised regulatory law at a large communications company, and civil defence litigation at a boutique litigation firm. Susie is a graduate of Queen’s Law School, a Fulbright Scholar, has a Master of Laws from Harvard Law School and was a fellow at the Berkman Klein Centre for Internet & Society at Harvard University.
Which talk track does this best fit into?
Business Strategy
Technical level of your talk?
(Technical level: 2/7)
What you’ll learn:
Latest expert views on managing AI risks in the fast changing regulatory environment
Abstract of Talk:
Artificial Intelligence (AI) is a tool that can be used both to manage risks in high-stake industries such as financial services as well as poses some risks by itself. This expert panel will discuss AI yesterday versus AI today, focusing on how AI has evolved and developed from simple automation to complex decisioning. Our experts will cover AI regulatory trends globally and in the US, best practices in risk and compliance management, and complementing technologies that can be utilized to counter new and emerging AI risks.
Presenters:
Shazia Akbar, Lead Machine Learning Engineer, Altis Labs & Ali Madani, Director of Machine Learning, Cyclica & Santosh Hariharan, Principal Scientist, Pfizer & Shiva Amiri, VP, Head of AI and Data Intelligence, Pivotal Life Sciences & Javier Diaz-Mejia, Head of Data Science, Phenomic AI
About the Speakers:
Dr. Shazia Akbar is the lead machine learning engineer at Altis Labs, a Toronto-based startup which leverages deep learning technologies to gain prognostic insight from medical imaging data. Since joining Altis Labs, Shazia has designed and developed artificial intelligent systems which ingest millions of imaging data to predict patient outcomes. Some of the applications she has developed to date include a fully automated model to quantify mortality risk in early stage lung cancer patients, and an x-ray model which determines in-patient admission risk of hospital patients diagnosed with community acquired pneumonia.
Shazia gained her PhD from the University of Dundee, UK, after which she joined the department of Radiology at New York University, US. In 2018, Shazia completed her postdoctoral fellowship at Sunnybrook Research Institute and the Vector Institute, designing novel deep learning algorithms for digital pathology. Her research interests include explainable AI, weakly supervised learning and applications of AI in healthcare.
Ali Madani leads ML technology development at Cyclica, a leading Canadian biotechnology company focused on AI based drug discovery. He is also editor of special Topic Artificial Intelligence in Cancer Diagnosis and Therapy at MDPI and work as an AI educator with companies like WeCloudData. Ali is a PhD graduate of University of Toronto; an alumni of University of Waterloo School of Engineering; and, attained a master of mathematics from the University of Waterloo. He is an active member of the machine learning community in Toronto and speaks in world and Canada wide conferences, webinars and workshops about technology development, machine learning, drug discovery and cancer therapeutics. He has also published more than 20 scientific articles in high impact factor journals on these subjects.
Santosh Hariharan is a Principal Scientist at Pfizer, committed to curing disease and improving patient lives. He enjoys solving complex biological problems using simple blocks with a motto of “”Seeing is Believing””. He develops and analyzes complex biology by looking at individual cells, evaluating their response to drugs/genetic perturbation and developing predictive models using AI and machine learning (Phenotyping).
Shiva Amiri is the VP, Head of AI and Data Intelligence at Pivotal Life Sciences, working at the intersection of computing and biology with experience in large-scale, multi-stakeholder technology development in data science and biology with a focus on computational biology, bioinformatics, machine learning, and big data systems. Shiva is a team builder and entrepreneurial in cutting edge computational methods in biology, digital health and medical research with a track record in big data/data science and program execution and strategy.
Javier is a data scientist with 15+ years of experience in projects aiming to solve problems of relevance for human health. He has experience in the academic, non-profit and industry sectors in Mexico, USA and Canada. Javier’s role involves identifying organization data science needs, building teams to implement solutions addressing those needs, and serving as a bridge between technical and executive stakeholders. Javier made his PhD studies in Mexico, postdoctoral training in Toronto and currently, works as Head of Data Science at Phenomic AI, a biotech startup developing machine learning tools to fight cancer.
Which talk track does this best fit into?
Business
Technical level of your talk?
(Technical level: 4/7)
What you’ll learn:
Examples of application of ML in healthcare; Impact of ML in healthcare technologies on patients; Potential biases in ML in Healthcare technologies
Abstract of Talk:
Machine Learning (ML) technologies learn how to accomplish tasks and identify patterns from available data. This data-based learning has been critical in helping the healthcare institute, biotechnology and pharmaceutical companies in developing new technologies to improve processes like disease diagnosis from radiological and pathological images and drug design. In this panel, we will discuss how companies design ML technologies that are not only in-line with their business model but eventually will impact patients and healthcare systems. We will also discuss the technological and sociological biases that need to be taken into account in the design of such technologies.
Presenters:
Nicolás Venegas Oliva, Technical Lead of Advanced Analytics, LATAM Airlines & Sarah Sun, Director Data Science, Scotiabank
About the Speakers:
Nicolas Matias Venegas Oliva has 2 years of experience in backend development, 2+ years in data processing and the last 3+ years as Advanced Analytics technical leader at LATAM Airlines. During this time the team has grown from 9 to 48 highly trained professionals. It has also become the team with the highest impact generation within the company and a reference in the region in terms of MLOps and measured business impact through data products.
A decade in data has taken Sarah across multiple industries, including banking, technology, and natural resources. While specializing in data strategy, she was trained as a data scientist and has worked across the industry in innovation, governance, AI, and also a stint as CEO of a startup. Working in data has taught Sarah some value lessons – everything from seizing opportunities, the important of mental health, and the power of sharing stories. Sarah was named one of the Women Executive Network’s Top 100 Most Powerful Women in 2019.
Which talk track does this best fit into?
Business Strategy
Technical level of your talk?
(Technical level: 1/7)
What you’ll learn:
Lessons to be learnt in the experiences shared on what not to do. In addition, we like to talk about the wins, but the reality is fails are more frequent, but also more lessons to be learnt!
Abstract of Talk:
We like to talk about the successes…but why don’t we ever talk about the FAILS across the data world? Join us as we swap stories crossing data, industrial, and even geographical boundaries. We may have failed…but maybe amongst all the tales there’s a lesson or two to be learnt across networking, recruiting, planning, model building, engineering….you name it ;)
Unity Health Toronto – VP: Data Science and Advanced Analytics; Director: Temerty Centre for Artificial Intelligence Research and Education in Medicine of the University of Toronto; Professor – University of Toronto
Dr. Mamdani is Vice President of Data Science and Advanced Analytics at Unity Health Toronto and Director of the University of Toronto Temerty Faculty of Medicine Centre for Artificial Intelligence Research and Education in Medicine (T-CAIREM). Dr. Mamdani’s team bridges advanced analytics including machine learning with clinical and management decision making to improve patient outcomes and hospital efficiency. Dr. Mamdani is also Professor in the Department of Medicine of the Temerty Faculty of Medicine, the Leslie Dan Faculty of Pharmacy, and the Institute of Health Policy, Management and Evaluation of the Dalla Lana Faculty of Public Health. He is also adjunct Senior Scientist at the Institute for Clinical Evaluative Sciences (ICES) and a Faculty Affiliate of the Vector Institute. In 2010, Dr. Mamdani was named among Canada’s Top 40 under 40. He has published over 500 studies in peer-reviewed medical journals. Dr. Mamdani obtained a Doctor of Pharmacy degree (PharmD) from the University of Michigan (Ann Arbor) and completed a fellowship in pharmacoeconomics and outcomes research at the Detroit Medical Center. During his fellowship, Dr. Mamdani obtained a Master of Arts degree in Economics from Wayne State University in Detroit, Michigan with a concentration in econometric theory. He then completed a Master of Public Health degree from Harvard University in 1998 with a concentration in quantitative methods.
Talk: Saving Lives with ML: Applications and Learnings
Abstract: Machine learning (ML) has transformed numerous industries but its application in healthcare has been limited. ML applications are expected to permeate healthcare in the near future with a recent explosion in academic and commercial activity. The application of ML in healthcare, however, is complicated by a variety of factors including the significant variability in needs, healthcare settings and patients served in these settings, workflows, and available resources. This talk will present a case study of Unity Health Toronto and its journey in developing and deploying numerous ML solutions into clinical practice, including bridging public and private sector partnerships to spread innovations internationally. The talk will also present a novel Canadian academic centre dedicated to artificial intelligence (AI) in medicine – the Temerty Centre for Artificial Intelligence Research and Education in Medicine (T-CAIREM) at the University of Toronto.
What You’ll Learn: The successful application of ML in healthcare is multifaceted and highly dependent on end-user engagement.
Innovative public-private partnerships are needed to spread ML applications globally.
Multidisciplinary, collaborative efforts will fuel innovations in the development and application of ML in healthcare.
Track: Case Study
Technical Level: 3
Location: Toronto
Presenter:
Muhammad Mamdani, Unity Health Toronto – VP: Data Science and Advanced Analytics; Director: Temerty Centre for Artificial Intelligence Research and Education in Medicine of the University of Toronto; Professor – University of Toronto
About the Speaker:
Dr. Mamdani is Vice President of Data Science and Advanced Analytics at Unity Health Toronto and Director of the University of Toronto Temerty Faculty of Medicine Centre for Artificial Intelligence Research and Education in Medicine (T-CAIREM). Dr. Mamdani’s team bridges advanced analytics including machine learning with clinical and management decision making to improve patient outcomes and hospital efficiency.
Dr. Mamdani is also Professor in the Department of Medicine of the Temerty Faculty of Medicine, the Leslie Dan Faculty of Pharmacy, and the Institute of Health Policy, Management and Evaluation of the Dalla Lana Faculty of Public Health. He is also adjunct Senior Scientist at the Institute for Clinical Evaluative Sciences (ICES) and a Faculty Affiliate of the Vector Institute. In 2010, Dr. Mamdani was named among Canada’s Top 40 under 40. He has published over 500 studies in peer-reviewed medical journals.
Dr. Mamdani obtained a Doctor of Pharmacy degree (PharmD) from the University of Michigan (Ann Arbor) and completed a fellowship in pharmacoeconomics and outcomes research at the Detroit Medical Center. During his fellowship, Dr. Mamdani obtained a Master of Arts degree in Economics from Wayne State University in Detroit, Michigan with a concentration in econometric theory. He then completed a Master of Public Health degree from Harvard University in 1998 with a concentration in quantitative methods.
Which talk track does this best fit into?
Case Study
Technical level of your talk?
(Technical level: 3/7)
Are there any industries (in particular) that are relevant for this talk?
Hospital & Health Care
Who is this presentation for?
The successful application of ML in healthcare is multifaceted and highly dependent on end-user engagement.
Innovative public-private partnerships are needed to spread ML applications globally.
Multidisciplinary, collaborative efforts will fuel innovations in the development and application of ML in healthcare.
Abstract of Talk:
Machine learning (ML) has transformed numerous industries but its application in healthcare has been limited. ML applications are expected to permeate healthcare in the near future with a recent explosion in academic and commercial activity. The application of ML in healthcare, however, is complicated by a variety of factors including the significant variability in needs, healthcare settings and patients served in these settings, workflows, and available resources. This talk will present a case study of Unity Health Toronto and its journey in developing and deploying numerous ML solutions into clinical practice, including bridging public and private sector partnerships to spread innovations internationally. The talk will also present a novel Canadian academic centre dedicated to artificial intelligence (AI) in medicine – the Temerty Centre for Artificial Intelligence Research and Education in Medicine (T-CAIREM) at the University of Toronto.
Director of Advanced Analytics, Coca Cola
Nikita has over 10 years of experience in the Retail and Consumer Packaged Goods industries, working for companies like Loblaw and Sears. He is also an alumnus of the Master of Management Analytics program from Queen’s University, and holds a Bachelor of Finance & Economics degree from University of Toronto
Co-Presenter: Winston Li
Talk: The Application of Mobile Location Data for Vending Machine Site Selection and Revenue Optimization.
Abstract: In this presentation, we present an innovative approach to utilizing mobility data to optimize the placement of vending machines in Canada. Coca-Cola has more than 10k vending machines in various locations and their ROI heavily depends on the amount of foot traffic next to them as well as who those people are. For this use case, we’ll be concentrating on using the super detailed mobility data to understand the difference between our best machines and worst at scale, and optimizing their location based on the mobility data to increase the ROI. In addition to the practical and business application, we’ll also be able to share the algorithms used and the tech stack with the audience.
What You’ll Learn: Mobility data as an alternative data source for consumer related analytics and its recency and granularity and really drive measurable business outcomes.
Track: Case Study
Technical Level: 4
Location: Toronto
Presenters:
Nikita Medvedev, Director of Advanced Analytics & Winston Li, Founder, Coca Cola & Arima
About the Speaker:
Winston is the founder of Arima, a Canadian based startup that provides consumer data to its users. Our flagship product, the Synthetic Society, is a privacy-by-design, individual level database that mirrors the real society. Built using trusted sources like census, market research, mobility and purchase patterns, it contains 10k+ attributes across North America and enables advanced modelling at the most granular level.
Prior to founding Arima, Winston was the Director of Data Science at PwC and Omnicom. Winston is also a part-time faculty member at Northeastern University Toronto and sits on the advisory board of the Master of Analytics program.
Nikita is the Director of Advanced Analytics at Coca-Cola Canada Bottling Limited. Together with his team he is transforming terabytes of business operations data into actionable insights to drive growth and innovate in the Consumer Packaged Goods industry. He loves finding novel solutions to old problems and is obsessed with driving real lasting change through better use of data.
Nikita has over 10 years of experience in the Retail and Consumer Packaged Goods industries, working for companies like Loblaw and Sears. He is also an alumnus of the Master of Management Analytics program from Queen’s University, and holds a Bachelor of Finance & Economics degree from University of Toronto.
Which talk track does this best fit into?
Case Study
Technical level of your talk?
(Technical level: 4/7)
Are there any industries (in particular) that are relevant for this talk?
Food & Beverages, Information Technology & Service, Marketing & Advertising
What are the main core message (learning) you want attendees to take away from this talk?
Mobility data as an alternative data source for consumer related analytics and its recency and granularity and really drive measurable business outcomes.
Abstract of Talk:
In this presentation, we present an innovative approach to utilizing mobility data to optimize the placement of vending machines in Canada. Coca-Cola has more than 10k vending machines in various locations and their ROI heavily depends on the amount of foot traffic next to them as well as who those people are. For this use case, we’ll be concentrating on using the super detailed mobility data to understand the difference between our best machines and worst at scale, and optimizing their location based on the mobility data to increase the ROI. In addition to the practical and business application, we’ll also be able to share the algorithms used and the tech stack with the audience.
Senior Engineering Manager – Safety, MLOps and Infrastructure, Amazon/Twitch
I worked as a Software Engineer Manager at Twitch about MLOps and Tooling in Safety team. Here is my linkedin. I spoke at Meta’s At Scale about Scaling ML Workflows for Real-Time Moderation Challenges at Twitch, I also spoke at TwitchCon about Integrating Data into Twitch at Scale. I worked in engineering leadership role for 5 years and our team made several company wide MLOps tooling such as orchstration and feature store.
Co-Presenter: Chen Liu
Talk: From Silo to Collaboration – Building Tooling to Support Distributed ML Teams at Twitch
Abstract: In this talk, we will cover Twitch’s current ML team structure and challenges of it. Then we dive deep into some solutions we have built to support ML development at Twitch, including what they are and how they will benefit the situation. We close with a discussion of Twitch’s distributed ML team style and how we collaborate using Conductor as an example.
ML has been playing a more and more important role in Twitch’s products (e.g. Recommendation, Safety). In order to allow products to iterate fast, we keep ML practitioners in the product teams and empower the teams to work independently. Undoubtedly, there are common challenges in ML development regardless of product areas. So we are striving to develop tooling and infrastructures for general ML development in order to reduce duplicate work across ML teams. We will dive into those efforts we made in this presentation. For example, Twitch machine learning feature store is developed to have a single control plane serving as feature registry but facilitates distributed feature ownership (e.g. storage, pipelines). Conductor, a in-house ML orchestration system, promotes best practices in pipeline management with templated process control flow and distributed infrastructure management. Meanwhile, we are promoting collaborative ML culture among Twitch engineering teams. It is similar to community owned open source projects where teams share the same interests and encourage cross team contribution and development.
What You’ll Learn: Twitch’s strategy of scaling our ML infra and MLOps tooling has never been discussed online. And we aim to help audience figure out the best strategy to utilize ML tooling for enhancing collaborations between ML teams and boost scientists self-service / efficiency. This is a good lesson if companies are seeking to start MLOps from stratch.
Track: Case Study
Technical Level: 4
Presenters:
Shiming Ren, Sr. Engineering Manager – Safety, MLOps and Infrastructure & Chen Liu, Twitch Sr. Engineering Manager on Personalization and ML Infra, Amazon/Twitch
About the Speaker:
I Shiming worked as a Software Engineer Manager at Twitch about MLOps and Tooling in Safety team. Here is my linkedin. I spoke at Meta’s At Scale about Scaling ML Workflows for Real-Time Moderation Challenges at Twitch, I also spoke at TwitchCon about Integrating Data into Twitch at Scale. I worked in engineering leadership role for 5 years and our team made several company wide MLOps tooling such as orchstration and feature store.
Chen is currently supporting teams working on personalization and ML infrastructures at Twitch. He is passionate about building scalable ML products and democratizing ML in the organization.
Which talk track does this best fit into?
Technical / Research
Technical level of your talk?
(Technical level: 4/7)
Are there any industries (in particular) that are relevant for this talk?
Computer Software, Information Technology & Service
Who is this presentation for?
Senior Business Executives, Product Managers, Data Scientists/ ML Engineers and High-level Researchers, Data Scientists/ ML Engineers, ML Engineers
What you’ll learn:
Twitch’s strategy of scaling our ML infra and MLOps tooling has never been discussed online. And we aim to help audience figure out the best strategy to utilize ML tooling for enhancing collaborations between ML teams and boost scientists self-service / efficiency. This is a good lesson if companies are seeking to start MLOps from stratch.
On a scale of 1-10 how mature is this applied AI application you plan to discuss?
7/10
Pre-requisite Knowledge:
Feature store, Orchstration, Large Scale Data Handling
What kind of DevOps tools you plan to discuss? Open source?
N/A Our tools are all in house
What are some of the languages you plan to discuss?
Python, Golang
What are some of the infrastructures you plan to discuss?
Feature Store, ML Orchstration, Realtime Inference, Distributed ML team collaborations
What is unique about this speech, from other speeches given on the topic?
We aim to use examples how Twitch build in house feature store, realtime inference and orchstration system to demonstrate from technology perspective about MLOps collaborations in a company. This is more like a hybrid tech and management talk which will benefit both engineer and leadership groups.
Abstract of Talk:
[High level intro]
In this talk, we will cover Twitch’s current ML team structure and its challenges of it. Then we dive deep into some solutions we have built to support ML development at Twitch, including what they are and how they will benefit the situation. We close with a discussion of Twitch’s distributed ML team style and how we collaborate using Conductor as an example.
[Actual abstract]
ML has been playing a more and more important role in Twitch’s products (e.g. Recommendation, Safety). In order to allow products to iterate fast, we keep ML practitioners in the product teams and empower the teams to work independently. Undoubtedly, there are common challenges in ML development regardless of product areas. So we are striving to develop tooling and infrastructures for general ML development in order to reduce duplicate work across ML teams. We will dive into those efforts we made in this presentation. For example, Twitch machine learning feature store is developed to have a single control plane serving as feature registry but facilitates distributed feature ownership (e.g. storage, pipelines). Conductor, a in-house ML orchestration system, promotes best practices in pipeline management with templated process control flow and distributed infrastructure management. Meanwhile, we are promoting collaborative ML culture among Twitch engineering teams. It is similar to community-owned open source projects where teams share the same interests and encourage cross team contribution and development.
Can you suggest 2-3 topics for post-discussion?
Manage ML teams collaboration in a distributed manner; ML tooling development from 0 to 10; Implementation details for feature store and ML orchestration system.
Lead Data Scientist, FreshBooks
Valerii joined FreshBooks a year ago to lead and grow a team of Data Scientists and Machine Learning Engineers. He has an experience in multiple industries ranging from Electronics to Clean Tech and has contributed to the development of innovative solutions for a variety of brands such as LG Electronics, Panasonic, Samsung, Toyota, Scotiabank, Cineplex. He has a University Degree in Telecom Engineering and PhD in Automated Control Systems. Author of 20 patented inventions in Signal Processing, Electronics and Computing.
Talk: Builidng a Fully Automated ML Platform Using Kubeflow and Declarative Approach to Development of End-to-End ML Pipelines
Abstract: Recent innovations in the ML ecosystem have seen the emergence of operationally-focused technology like declarative systems and data-centric AI. These techniques appear to be a radical change for AI practitioners, who can now more simply frame use cases and manage workflows. In this talk, we’ll take a look at the history of AI to see the progress that has been made and how we’ve arrived at where we are now. How are high-tech companies handling AI initiatives internally, and why aren’t we all copying them? Has MLOps been the promised solution to simplifying deployment and monitoring of production AI? How do we create a simpler paradigm for operationalizing AI? All these questions and more will be addressed.
What You’ll Learn: A journey to higher levels of MLOps maturity is unique for any company and has no recipes due to experimental nature of MLOps. Many insights and ideas in this area are the results of investments by big names (Google, Microsoft, Amazon) and knowledge sharing between smaller companies like us working on similar problems. We are grateful for this opportunity to contribute to the ecosystem so that others can learn from us.
Track: Case Study
Technical Level: 6
Location: Toronto
Presenters:
Valerii Podymov, Lead Data Scientist, FreshBooks & Roshan Isaac, Machine Learning Engineer, FreshBooks & Vlad Ryzhkov, Senior Data Engineer, FreshBooks & Joey Zhou, Senior Data Engineer, FreshBooks
About the Speaker:
Valerii joined FreshBooks a year ago to lead and grow a team of Data Scientists and Machine Learning Engineers. He has an experience in multiple industries ranging from Electronics to Clean Tech and has contributed to the development of innovative solutions for a variety of brands such as LG Electronics, Panasonic, Samsung, Toyota, Scotiabank, Cineplex. He has a University Degree in Telecom Engineering and PhD in Automated Control Systems. Author of 20 patented inventions in Signal Processing, Electronics and Computing.
Roshan works as a Machine Learning Engineer at FreshBooks where he is building ML Platform on Vertex AI and bringing MLOps best practices to the organization. He was previously at the same role with Cineplex. He has a Bachelor Degree in Computer Science and Engineering and hold graduate certificates in AI & Project Management. Overall he has 8+ years of experience in Machine Learning, Data Analytics and CRM software working in different startups and companies in Canada and India. He published papers in IEEE conferences and was a speaker at Libre Software Meeting (LSM), France.
Vlad joined FreshBooks a year ago with extensive Data Engineering background and he works on building ML Platform bringing best practices in large-scale data processing to the company. He has a PhD in System Analysis, Management and Information Processing. Overall, his 15+ years of software development experience comprises such areas as financial systems, e-commerce, e-sport and airlines in Canada and overseas.
Joey joined FreshBooks three months ago and works on the continuous monitoring framework for the ML team. Before, he had an experience in the tech industry, ranging from social-dating to e-commerce, in multiple roles such as Data Scientist and Machine Learning Engineer. He built a recommender systems for one of the largest e-commerce platforms in China. With hands-on experience in building and productionizing ML models, he is ready to pursue his passion for MLOps at FreshBooks.
Which talk track does this best fit into?
Technical / Research
Technical level of your talk?
(Technical level: 6/7)
Are there any industries (in particular) that are relevant for this talk?
Banking & Financial Services, Computer Software
Who is this presentation for?
Data Scientists/ ML Engineers, ML Engineers
What you’ll learn:
How we tackled existing challenges with Kubeflow pipelines changing the imperative approach to the declarative.
What are the main core message (learning) you want attendees to take away from this talk?
A journey to higher levels of MLOps maturity is unique for any company and has no recipes due to experimental nature of MLOps. Many insights and ideas in this area are the results of investments by big names (Google, Microsoft, Amazon) and knowledge sharing between smaller companies like us working on similar problems. We are grateful for this opportunity to contribute to the ecosystem so that others can learn from us.
On a scale of 1-10 how mature is this applied AI application you plan to discuss?
9/10
Pre-requisite Knowledge:
Machine Learning Lifecycle
What kind of DevOps tools you plan to discuss? Open source?
GitHub Actions, Kubeflow
What are some of the languages you plan to discuss?
Python, SQL
What are some of the infrastructures you plan to discuss?
BigQuery, Airflow, Vertex AI, containers
What is unique about this speech, from other speeches given on the topic?
Managing MLOps is highly immature topic with lack or absence of commonly accepted best practice, so the experience of any company in growing over MLOps maturity levels is always unique.
Abstract of Talk:
This talk is about our journey at FreshBooks from mostly manual processes in productionizing of our ML models to the highest levels of maturity in MLOps. First, we briefly go over a list of challenges we faced when working on the ML platform as a hybrid team of Data Scientists, ML Engineers and Data Ops Engineers. And then we provide more detailed overview of our end-to-end Kubeflow pipelines and a declarative MLOps framework that has been designed to speed up, simplify and improve the reliability of ML pipelines at each stage from development to production. Lessons learned and what’s next will be provided at the end of the talk.
Can you suggest 2-3 topics for post-discussion?
ML Ops, ML Model Governance
Staff Data Scientist, Anheuser-Busch
Eric is a Staff Data Scientist with more than 7 years of experience working at Altair Engineering and Anheuser-Busch. He has a PhD in probability from the University of Toronto, and a masters degree in Applied Math and an undergraduate degree in Engineering from Queen’s university. He’s also a world champion Blokus player.
Talk: Optimal Beer Pricing: An Optimization Layer for Price Elasticities
Abstract: At Anheuser-Busch, we’re obsessed with price elasticities. When the price of beer changes, how will that affect the volume of beer that we sell? These questions (yes, this is more than one question) have implications all over the business, from price setting to procurement to financial planning. We’ve worked hard to make sure our answers to these questions are as data driven as possible. But once we have a model to produce (and predict) these elasticities, how do we make business decisions based on that? And how do we make sure those business decisions are also as data driven as possible?
In this talk we’ll discuss an optimal pricing layer for beer elasticities. We’ll cover how to use mathematical optimization to make specific price change suggestions at a variety of granularities to help achieve specific business objectives. We’ll consider what objective we actually want to optimize (Profit? Revenue? Market Share?) and see how to use constraints to help smooth the trade-off between these objectives. Finally, we’ll investigate how to ensure our price suggestions stay within the regions where the underlying elasticities models make sense.
Ever wanted to see a real-world example of levelling up your analytics from predictive- to prescriptive-, and do so in the context of price setting (or beer drinking)? Now’s your chance!
What You’ll Learn: How to add an optimization layer to ml models.
Track: Case Study
Technical Level: 2
Location: Toronto
Presenter:
Eric Hart, Staff Data Scientist at Anheuser-Busch
About the Speaker:
Eric is a Staff Data Scientist with more than 7 years of experience working at Altair Engineering and Anheuser-Busch. He has a PhD in probability from the University of Toronto, and a masters degree in Applied Math and an undergraduate degree in Engineering from Queen’s university. He’s also a world champion Blokus player.
Which talk track does this best fit into?
Case Study
Technical level of your talk?
(Technical level: 2 /7)
Are there any industries (in particular) that are relevant for this talk?
Banking & Financial Services, Food & Beverages, Marketing & Advertising
Who is this presentation for?
Senior Business Executives, Product Managers, Data Scientists/ ML Engineers and High-level Researchers
What you’ll learn:
Putting a mathematical optimization layer on top of predictive models is still a mostly unused tool in the ML space. It’s very difficult to learn about that from existing resources.
What are the main core message (learning) you want attendees to take away from this talk?
How to add an optimization layer to ml models.
Pre-requisite Knowledge:
Not a lot. We’ll briefly discuss what price-elasticities and mathematical optimization are, but having heard those terms before (with a basic understanding) would help.
What is unique about this speech, from other speeches given on the topic?
I would argue the whole topic is fairly unique (optimization layers for predictive models are not widely used or discussed). In addition, the specifics of trying to work around the realities of the beer industry (especially varying laws about beer pricing across different geographies) add an extra layer of complexity to this already deep problem.
Abstract of Talk:
At Anheuser-Busch, we’re obsessed with price elasticities. When the price of beer changes, how will that affect the volume of beer that we sell? These questions (yes, this is more than one question) have implications all over the business, from price setting to procurement to financial planning. We’ve worked hard to make sure our answers to these questions are as data driven as possible. But once we have a model to produce (and predict) these elasticities, how do we make business decisions based on that? And how do we make sure those business decisions are also as data driven as possible?
In this talk we’ll discuss an optimal pricing layer for beer elasticities. We’ll cover how to use mathematical optimization to make specific price change suggestions at a variety of granularities to help achieve specific business objectives. We’ll consider what objective we actually want to optimize (Profit? Revenue? Market Share?) and see how to use constraints to help smooth the trade-off between these objectives. Finally, we’ll investigate how to ensure our price suggestions stay within the regions where the underlying elasticities models make sense.
Ever wanted to see a real-world example of levelling up your analytics from predictive- to prescriptive-, and do so in the context of price setting (or beer drinking)? Now’s your chance!
Can you suggest 2-3 topics for post-discussion?
Optimization Layers. Price Elasticities.
Staff Applied Scientist, Loblaw Digital
Jawad currently works as a Staff Applied Scientist at Loblaw Digital, supporting ML teams building personalization and recommender systems for different lines of business of Loblaw Companies.
He has 8 years of industry experience in Applied AI/ML. Previously, he worked at Flipp, Dialpad and McKinsey Solutions.
His areas of interest are using ML research applications to help build products with scalable ML solutions in NLP, Conversational AI, Computer Vision and Recommender Systems. Read more on linkedin
Talk: Solving Product Substitutions, The #1 Problem in Grocery E-Commerce – Through Self-Supervised ML
Abstract:
Background: Loblaw Companies Ltd is the largest grocery retailer in Canada. It operates multiple popular banners with Real Canadian Superstore, No Frills, and T&T being some of the most popular ones. E-commerce of grocery has become a significant part of the business accounting for more than $2 billion in sales per year.
Problem: Shopping for groceries online is an inherently different process than shopping in person. We take for granted the in-store shopper’s ability to make quick decisions on the fly when faced with the issue of product availability.
We fulfill from stores to ensure freshness which has a very dynamic inventory. This makes promises of items collected, sometimes a day or two after the order depending on the customer’s delivery date, affected by many factors – some of which we cannot control. Thus, we need a solution to substitute items that are out of stock at the time of picking to make sure the customer experience is minimally impacted. While shopping at a physical store, a customer can make a suitable choice of an alternative. In the e-commerce process of grocery shopping, either the customer has to make a selection of the substitute, or the Loblaw employee picking the order on behalf of the customer needs a relevant suggestion on the best substitute for the given item, personalized for the given customer.
Loblaw has historical data available on what selection was made by customers from the list of various possible substitute options available for a given item. Additionally, there is data available on the choices made by pickers – the employees who shop at the store to fulfill customers’ orders. This provides us an opportunity to tailor product similarities toward product substitutions that are tied to business metrics.
Solution: We explored multiple solutions to solve this problem. Our most promising solution that we wish to present leverages features extracted from text descriptions and images of products. In this talk, we will discuss how our approach evolved over time and how this cutting-edge self-supervised method is a big improvement over the traditional techniques.
What You’ll Learn: The talk covers the data curation process by which we prepared a benchmarking Products Substitutions dataset using historical human-selected substitutions data at Loblaw.
The audience will learn about self-supervised ML approaches we used to recommend product substitutions benchmarked against the above mentioned products substitutions Testset.
Track: Case Study
Technical Level: 5
Location: Waterloo, ON
Presenter:
Jawad Ahmed, Staff Applied Scientist, Loblaw Digital
About the Speaker:
Jawad currently works as a Staff Applied Scientist at Loblaw Digital, supporting ML teams building personalization and recommender systems for different lines of business of Loblaw Companies.
He has 8 years of industry experience in Applied AI/ML. Previously, he worked at Flipp, Dialpad and McKinsey Solutions.
His areas of interest are using ML research applications to help build products with scalable ML solutions in NLP, Conversational AI, Computer Vision and Recommender Systems. Read more on linkedin.
Which talk track does this best fit into?
Case Study
Technical level of your talk?
(Technical level: 5/7)
What you’ll learn:
The talk covers the data curation process by which we prepared a benchmarking Products Substitutions dataset using historical human-selected substitutions data at Loblaw.
The audience will learn about self-supervised ML approaches we used to recommend product substitutions benchmarked against the above mentioned products substitutions Testset.
Abstract of Talk:
Background: Loblaw Companies Ltd is the largest grocery retailer in Canada. It operates multiple popular banners with Real Canadian Superstore, No Frills, and T&T being some of the most popular ones. E-commerce of grocery has become a significant part of the business accounting for more than $2 billion in sales per year.
Problem: Shopping for groceries online is an inherently different process than shopping in person. We take for granted the in-store shopper’s ability to make quick decisions on the fly when faced with the issue of product availability.
We fulfill from stores to ensure freshness which has a very dynamic inventory. This makes promises of items collected, sometimes a day or two after the order depending on the customer’s delivery date, affected by many factors – some of which we cannot control. Thus, we need a solution to substitute items that are out of stock at the time of picking to make sure the customer experience is minimally impacted. While shopping at a physical store, a customer can make a suitable choice of an alternative. In the e-commerce process of grocery shopping, either the customer has to make a selection of the substitute, or the Loblaw employee picking the order on behalf of the customer needs a relevant suggestion on the best substitute for the given item, personalized for the given customer.
Loblaw has historical data available on what selection was made by customers from the list of various possible substitute options available for a given item. Additionally, there is data available on the choices made by pickers – the employees who shop at the store to fulfill customers’ orders. This provides us an opportunity to tailor product similarities toward product substitutions that are tied to business metrics.
Solution: We explored multiple solutions to solve this problem. Our most promising solution that we wish to present leverages features extracted from text descriptions and images of products. In this talk, we will discuss how our approach evolved over time and how this cutting-edge self-supervised method is a big improvement over the traditional techniques.
Data Scientist, Manifest Climate
I am a data scientist at Manifest Climate, working on applying machine learning and natural language processing to climate disclosures. Extracting information at scale is paramount to increase transparency in financial markets, so that we can improve decision-making with data-driven climate information.
Talk: Assessing Alignment of Climate Disclosures Using NLP for the Financial Markets
Abstract: Climate-related disclosure is increasing in importance as companies and stakeholders alike aim to reduce their environmental impact and exposure to climate-induced risk. Companies primarily disclose this information in annual or other lengthy documents where climate information is not the sole focus. To assess the quality of a company’s climate-related disclosure, these documents, often hundreds of pages long, must be reviewed manually by climate experts. We propose a more efficient approach to assessing climate-related financial information. We construct a model leveraging TF-IDF, sentence transformers and multi-label k nearest neighbors (kNN). The developed model is capable of assessing alignment of climate disclosures at scale, with a level of granularity and transparency that will support decision-making in the financial markets with relevant climate information.
What You’ll Learn: How an early-stage startup runs machine learning experiments ; takes decisions balancing model performance, model explainability, resource constraints and added business value ; uses deep language models to create the most valuable business opportunities.
Track: Case Study
Technical Level: 5
Location: Toronto
Presenters:
Quoc Tien Au, Data Scientist, Manifest Climate & Aysha Cotterill, Data Analyst, Manifest Climate
About the Speakers:
I am a data scientist at Manifest Climate, working on applying machine learning and natural language processing to climate disclosures. Extracting information at scale is paramount to increase transparency in financial markets, so that we can improve decision-making with data-driven climate information.
Which talk track does this best fit into?
Case Study
Technical level of your talk?
(Technical level: 5/7)
What you’ll learn:
How an early-stage startup runs machine learning experiments ; takes decisions balancing model performance, model explainability, resource constraints and added business value ; uses deep language models to create the most valuable business opportunities.
Abstract of Talk:
Climate-related disclosure is increasing in importance as companies and stakeholders alike aim to reduce their environmental impact and exposure to climate-induced risk. Companies primarily disclose this information in annual or other lengthy documents where climate information is not the sole focus. To assess the quality of a company’s climate-related disclosure, these documents, often hundreds of pages long, must be reviewed manually by climate experts. We propose a more efficient approach to assessing climate-related financial information. We construct a model leveraging TF-IDF, sentence transformers and multi-label k nearest neighbors (kNN). The developed model is capable of assessing alignment of climate disclosures at scale, with a level of granularity and transparency that will support decision-making in the financial markets with relevant climate information.
Presenter:
Amish Popli, Data Scientist, SpotHero
About the Speaker:
Amish Popli is passionate about solving challenging business problems using data science and machine learning. He supports multiple departments at SpotHero including, but not limited to, marketing, sales, and product development. He likes data, manipulating it, making it (simulation), modelling it, visualizing it, and yes, even cleaning it. He works with different PMs and engineers in different domains and has brought many successful products from discovery to production.
Which talk track does this best fit into?
Case Study
Technical level of your talk?
(Technical level: 4/7)
Are there any industries (in particular) that are relevant for this talk?
Parking
Who is this presentation for?
Product Managers, Data Scientists/ ML Engineers
What you’ll learn:
Experimentation is a very nebulous topic. There is a lot of companies, articles and research available on google, but each company has its own unique way of running and measuring impact from experiments.
Pre-requisite Knowledge:
Knowledge of basic statistical tests
What is unique about this speech, from other speeches given on the topic?
In my knowledge there is no company in North America doing data science/ML in parking industry. The problems we are solving are present in other industries but parking adds another layer of complexity on top of it.
Abstract of Talk:
SpotHero is the biggest and fastest growing off-street reservation platform in North America. It is a two sided marketplace involving drivers and parking garage owners. The data science team at SpotHero is working on many interesting problems in the areas of dynamic pricing, marketing, ranking etc. One of the key challenges that we face is how we test our machine learning models in production and make sure that the changes we make lead to an improvement in our KPI’s. In this talk, I will focus on how SpotHero runs experiments whenever we make improvements or create a new model to generate prices for our parking spots. I will cover why the general A/B test framework will not work in our scenario, various approaches that we considered and introduce switchback experimentation as an alternative. I will discuss our experiment design and conclude the talk with a result from one of our experiments and our technical architecture.
Can you suggest 2-3 topics for post-discussion?
A/B Tests, Switchback experiments, challenges in running live expeirments
Head of Machine Learning Platform, DoorDash
Hien Luu is a Sr. Engineering Manager at DoorDash, leading the Machine Learning Platform team. He is particularly passionate about the intersection between Big Data and Artificial Intelligence. He is the author of the Beginning Apache Spark 3 book. He has given presentations at various conferences such as GHC 2022, Data+AI Summit, XAI 21 Summit, MLOps World, YOW Data!, appy(), QCon (SF,NY, London).
Talk: Scaling & Evolving the Machine Learning Platform at DoorDash
Abstract: As DoorDash business grows, the online ML prediction volume grows exponentially to support the various Machine Learning use cases, such as the ETA predictions, the Dasher assignments, the personalized restaurants and menu items recommendations, and the ranking of the large volume of search queries.
In this session, we will share our journey of building and scaling our Machine Learning platform and particularly the prediction service, the various optimizations experimented, lessons learned, technical decisions and tradeoffs made. We will also share how we measure success and how we set goals for the future.
What You’ll Learn: The challenges and learning lessons from building an ML platform to support ML at scale
Track: Case Study
Technical Level: 5
Location: San Jose, CA
Presenter:
Hien Luu, Head of Machine Learning Platform, DoorDash
About the Speaker:
Hien Luu is a Sr. Engineering Manager at DoorDash, leading the Machine Learning Platform team. He is particularly passionate about the intersection between Big Data and Artificial Intelligence. He is the author of the Beginning Apache Spark 3 book. He has given presentations at various conferences such as GHC 2022, Data+AI Summit, XAI 21 Summit, MLOps World, YOW Data!, appy(), QCon (SF,NY, London).
Which talk track does this best fit into?
Case Study
Technical level of your talk?
(Technical level: 5/7)
Who is this presentation for?
Senior Business Executives, Product Managers, Data Scientists/ ML Engineers and High-level Researchers, Data Scientists/ ML Engineers, ML Engineers
What you’ll learn:
The journey in scaling the ML platform
On a scale of 1-10 how mature is this applied AI application you plan to discuss?
10/10
Pre-requisite Knowledge:
High level understanding of microservices
What kind of DevOps tools you plan to discuss? Open source?
CI/CD, Git, MLFlow,
What are some of the languages you plan to discuss?
Python, Kotlin
What are some of the infrastructures you plan to discuss?
Feature engineering at scale, low latency and high QPS model prediction service
What is unique about this speech, from other speeches given on the topic?
This is a case study about our journey of building ML platform at DoorDash
Abstract of Talk:
As DoorDash business grows, the online ML prediction volume grows exponentially to support the various Machine Learning use cases, such as the ETA predictions, the Dasher assignments, the personalized restaurants and menu items recommendations, and the ranking of the large volume of search queries.
In this session, we will share our journey of building and scaling our Machine Learning platform and particularly the prediction service, the various optimizations experimented, lessons learned, technical decisions and tradeoffs made. We will also share how we measure success and how we set goals for the future.
Can you suggest 2-3 topics for post-discussion?
Adopting MLOps
Presenters:
Hanieh Arjmand, ML Researcher, Lydia.ai & Spark Tseung, Applied Data Scientist, Lydia.ai
About the Speakers:
Hanieh Arjmand is a Machine Learning Researcher at Lydia.ai where she focuses on discovering and applying the best machine learning techniques to healthcare and insurance problems to help insurers use machine learning to protect more people.
Spark Tseung is an Applied Data Scientist at Knowtions Research where he focuses on building frameworks for actuarial and underwriting validation to help insurers use machine learning to protect more people. Spark is working towards his Ph.D. in Statistics and specializes in the application of machine learning methods in Property & Casualty loss modelling and risk selection.
Which talk track does this best fit into?
Case Study
Technical level of your talk?
(Technical level: 4/7)
What you’ll learn:
Using case studies from our work, we will discuss potential approaches for designing some of the sensitivity tests, which have helped us understand different aspects of model behaviours and data biases.
Abstract of Talk:
Model interpretability is important especially in regulated industries where risk-sensitive decisions typically require transparency and reliability of the underlying model. While often model interpretability gets sacrificed in other fields in order to achieve superior predictive performance, this is not the case in the regulated industries such as healthcare where model fairness plays an important role. In this talk, we will present case studies to illustrate the importance of sensitivity analysis for model interpretability and to showcase our design and implementations. Depending on the use cases of machine learning models, sensitivity tests have to be specifically and carefully designed and implemented. Using our machine learning models on electronic health record (EHR) and human activity, we will discuss potential approaches for designing some of the sensitivity tests, which have helped us understand different aspects of model behaviour and even uncover the unwanted biases and behaviours that had to be eliminated.
Presenters:
Kyryl Truskovskyi, Applied Research Scientist, Georgian & Rohit Saha, Applied Research Scientist, Georgian
About the Speakers:
Kyryl has over eight years of experience in the field of Machine Learning. For the bulk of this career, he has helped build machine learning startups, from inception to a product. He has also developed expertise in choosing and implementing state-of-the-art deep learning architectures and large-scale solutions based on them.
Rohit Saha is currently an applied research scientist at Georgian’s R&D team and is assisting portfolio companies with their research endeavours. Owing to previous roles, he has experience building end-to-end machine learning pipelines. He holds a master’s degree from the University of Toronto, and his research interests include generative modelling and transfer learning for Computer Vision tasks
Which talk track does this best fit into?
Case Study
Technical level of your talk?
(Technical level: 6/7)
What you’ll learn:
The insights and findings we share in this talk are derived from using the latest ML techniques and tools for solving real-world use cases at SPINS and are not readily available on the internet. We also open the stage for Q&A at the end of this talk to address the questions from the audience.
Abstract of Talk:
In recent years, we have seen amazing results in artificial intelligence and machine learning owing to the emergence of models such as transformers and pretrained language models. Despite the astounding results published in academic papers, there remains a lot of ambiguity and challenges when it comes to deploying these models in the industry because 1) troubleshooting, training, and maintaining these models is very time and cost consuming due to their inherent large sizes and complexities 2) there is not yet enough clarity about when the advantages and challenges of these models outweigh classical ML models. These challenges are even more severe for small and mid-sized companies that do not have access to huge compute resources and infrastructure. In this talk, we discuss these challenges and share our findings and recommendations from working on real-world examples at SPINS, a data/tech company focused on the natural grocery industry. More specifically, we describe how we leverage state-of-the-art language models to seamlessly automate parts of SPINS’ data ingestion workflow and drive substantial business outcomes. We provide a walk-through of our end-to-end MLOps system and discuss how using the right tools and methods has helped to mitigate some of these challenges. We also share our findings from our experimentation and provide insights on when one should use these massive transformer models instead of classical ML models. Considering that we have a variety of challenges in our use cases from an ill-defined label space to a huge number of classes (~86,000) and massive data imbalance, we believe our findings and recommendations can be applied to most real-world settings. We hope that the learnings from this talk can help you to solve your own problems more effectively and efficiently!
Presenters:
Nicolas Venegas Oliva, Technical Lead of Advanced Analytics, LATAM Airlines & Cristóbal Guzmán Wilkendorf, Staff Data Scientist, LATAM Airlines
About the Speakers:
2 years of experience in backend development, 2+ years in data processing and the last 3+ years as Advanced Analytics technical leader at LATAM Airlines. During this time the team has grown from 9 to 48 highly trained professionals. It has also become the team with the highest impact generation within the company and a reference in the region in terms of MLOps and measured business impact through data products.
Which talk track does this best fit into?
Case Study
Technical level of your talk?
(Technical level: 5/7)
What you’ll learn:
Scaling of MLOps teams, impact measurement, selection and training of highly technical teams.
Abstract of Talk:
For data science teams looking to create real business value with AI – MLOps is not something that’s ‘nice to have’ – It’s a MUST HAVE. To make MLOps work for your organization, you need to have the right tools combined with the right skillset across the different roles, and a unified process. For LATAM Airlines Group, being faced with the worst airline industry crisis following the COVID-19 pandemic, MLOps was imperative. We set off to create a cross-company MLOps strategy and implement it across dozens of use cases. In this talk, we will share our MLOps strategy, provide tips for success, pitfalls to avoid based on our own data science journey and dive into two of our use cases.
Presenter:
Serena McDonnell, Lead Data Scientist, Delphia
About the Speaker:
Serena is a Lead Data Scientist and quant researcher at Delphia, where she uses machine learning to power the fund’s long-short equity market neutral strategy. Passionate about knowledge sharing and continuous learning, Serena co-hosts Deep Random Talks, a podcast which focusses on machine learning, product development, and knowledge management. She is an organizer of AI Socratic Circles (AISC), a highly technical machine learning reading group for industry professionals. As part of AISC, Serena leads a research group that focusses on applying natural language processing and representation learning to recommender systems. Serena holds an M.Sc. in Mathematics from the Hong Kong University of Science and Technology, and a B.S.C. in Mathematics and Biology from McGill University.
Which talk track does this best fit into?
Case Study
Technical level of your talk?
(Technical level: 4/7)
What you’ll learn:
– Understand the advantages of alternative data in investing in general.
– Understand the promise of alternative data in quantitative equity strategies, and the challenges.
– Develop an opinion on the value of alternative data, when to invest in it, and when to consider sticking to more traditional data sources.
Abstract of Talk:
Applying alternative data to quantitative equity strategies has high potential and unique challenges. In this talk, we will use Delphia’s machine learning driven long-short equity market neutral strategy as context to discuss the following:
– Case studies to highlight the advantages of alternative data in investing in general.
– The promise of alternative data in quantitative equity strategies.
– The challenges in working with alternative data in Delphia’s strategy
Data Scientists, Unity Health Toronto
Chloé Pou-Prom is a data scientist with the Data Science and Advanced Analytics (DSAA) team at Unity Health Toronto. The DSAA team uses high quality healthcare data in innovative ways to catalyze communities of data users and decision makers in making transformative changes that improve patient outcomes and healthcare system efficiency.
Co-Presenter: Vaakesan Sundrelingam
Workshop: NLP for Healthcare: Challenges With Processing and De-Identifying Clinical Notes
Abstract: Clinical notes (e.g., admission notes, nurse notes, radiology reports) are rich with information. In this session, we discuss the challenges of working with text data from two different perspectives. First, we provide an overview of the different issues that one can encounter when working with healthcare data, with an emphasis on data processing and cleaning. Then, we focus on the challenges that arise when it comes to sharing data across hospitals, more specifically de-identifying clinical text data. Finally, we provide a demo of pydeid, a Python-based de-identification software that identifies and replaces personal health information (PHI).
What You’ll Learn:
1) Why NLP for healthcare is challenging;
2) Why sharing clinical notes across hospitals is difficult; and
3) Some tips and tools to help out with (1) and (2)
Technical Level: 3
Location: Toronto
Presenters:
Chloe Pou-Prom, Data Scientists, Unity Health Toronto & Vaakesan Sundrelingam, Data Scientists, Unity Health Toronto
About the Speakers:
Chloé Pou-Prom is a data scientist with the Data Science and Advanced Analytics (DSAA) team at Unity Health Toronto. The DSAA team uses high quality healthcare data in innovative ways to catalyze communities of data users and decision makers in making transformative changes that improve patient outcomes and healthcare system efficiency.
Vaakesan Sundrelingam is a data scientist with the GEMINI team at Unity Health Toronto. GEMINI is Canada’s largest hospital data & analytics study, helping physicians, health care teams, and hospitals use data to gain insights into patient care and improve patient outcomes. GEMINI uses machine learning in creative ways to prepare large amounts of data for researchers, as well as in clinical applications such as to detect particularly difficult to measure conditions for quality of care improvement initiatives.
Technical level of your talk?
(Technical Level: 3/7)
What you’ll learn:
1) Why NLP for healthcare is challenging;
2) Why sharing clinical notes across hospitals is difficult; and
3) Some tips and tools to help out with (1) and (2)
Abstract of Talk:
Clinical notes (e.g., admission notes, nurse notes, radiology reports) are rich with information. In this session, we discuss the challenges of working with text data from two different perspectives. First, we provide an overview of the different issues that one can encounter when working with healthcare data, with an emphasis on data processing and cleaning. Then, we focus on the challenges that arise when it comes to sharing data across hospitals, more specifically de-identifying clinical text data. Finally, we provide a demo of pydeid, a Python-based de-identification software that identifies and replaces personal health information (PHI).
Software Engineer / Data Scientist, Bloomberg
Stefanie Molin is a software engineer and data scientist at Bloomberg in New York City, where she tackles tough problems in information security, particularly those revolving around data wrangling/visualization, building tools for gathering data, and knowledge sharing. She is also the author of “Hands-On Data Analysis with Pandas,” which is currently in its second edition. She holds a bachelor’s of science degree in operations research from Columbia University’s Fu Foundation School of Engineering and Applied Science. She is currently pursuing a master’s degree in computer science, with a specialization in machine learning, from Georgia Tech. In her free time, she enjoys traveling the world, inventing new recipes, and learning new languages spoken among both people and computers.
Workshop: Beyond the Basics: Data Visualization in Python
Abstract: The human brain excels at finding patterns in visual representations, which is why data visualizations are essential to any analysis. Done right, they bridge the gap between those analyzing the data and those consuming the analysis. However, learning to create impactful, aesthetically-pleasing visualizations can often be challenging. This session will equip you with the skills to make customized visualizations for your data using Python.
While there are many plotting libraries to choose from, the prolific Matplotlib library is always a great place to start. Since various Python data science libraries utilize Matplotlib under the hood, familiarity with Matplotlib itself gives you the flexibility to fine tune the resulting visualizations (e.g., add annotations, animate, etc.). This session will also introduce interactive visualizations using HoloViz, which provides a higher-level plotting API capable of using Matplotlib and Bokeh (a Python library for generating interactive, JavaScript-powered visualizations) under the hood.
What You’ll Learn: Data visualization is essential for anyone working with data, but sometimes it can be difficult to create impactful visualizations in Python. In this workshop, we will move beyond the plotting basics and explore how to make compelling static, animated, and interactive visualizations.
Technical Level: 4
Location: New York City
Presenter:
Stefanie Moliin, Software Engineer / Data Scientist, Bloomberg
About the Speaker:
Stefanie Molin is a software engineer and data scientist at Bloomberg in New York City, where she tackles tough problems in information security, particularly those revolving around data wrangling/visualization, building tools for gathering data, and knowledge sharing. She is also the author of “Hands-On Data Analysis with Pandas,” which is currently in its second edition. She holds a bachelor’s of science degree in operations research from Columbia University’s Fu Foundation School of Engineering and Applied Science. She is currently pursuing a master’s degree in computer science, with a specialization in machine learning, from Georgia Tech. In her free time, she enjoys traveling the world, inventing new recipes, and learning new languages spoken among both people and computers.
Which talk track does this best fit into?
Workshop (1.5-4 hours)
Technical level of your talk?
(Technical level: 4/7)
Who is this presentation for?
Data Scientists/ ML Engineers, ML Engineers, Researchers
What you’ll learn:
A workshop provides the attendees opportunities to ask questions to make sure they are understanding the concepts. Attendees will also have a workshop of curated examples using real-world data rather than the dummy or randomly-generated data nearly everywhere. Each of the visualizations is also created step-by-step, viewing how it changes with each command, which gives attendees a much stronger grasp of the concepts that they can apply elsewhere.
What are the main core message (learning) you want attendees to take away from this talk?
Data visualization is essential for anyone working with data, but sometimes it can be difficult to create impactful visualizations in Python. In this workshop, we will move beyond the plotting basics and explore how to make compelling static, animated, and interactive visualizations.
Pre-requisite Knowledge:
You should have basic knowledge of Python and be comfortable working in Jupyter Notebooks. Check out this notebook for a crash course in Python or work through the official Python tutorial for a more formal introduction. The environment we will use for this workshop comes with JupyterLab, which is pretty intuitive, but be sure to familiarize yourself using notebooks in JupyterLab and additional functionality in JupyterLab. In addition, a basic understanding of pandas will be beneficial, but is not required; reviewing the first section of my pandas workshop will be sufficient.
What is unique about this speech, from other speeches given on the topic?
My teaching style is very different: since the code examples I provide are carefully chosen, it’s easy to see why would take the approach I show, so I make sure that the attendees understand exactly what each line of code is doing to make that happen. I find that this gives the attendees knowledge that they can apply to other problems, rather than just knowing that the code all together has some effect — they get a deeper understanding and can use the concepts like building blocks for their own use cases. Attendees often praise the content in the slides as a detailed reference for later as well.
Abstract of Talk:
The human brain excels at finding patterns in visual representations, which is why data visualizations are essential to any analysis. Done right, they bridge the gap between those analyzing the data and those consuming the analysis. However, learning to create impactful, aesthetically-pleasing visualizations can often be challenging. This session will equip you with the skills to make customized visualizations for your data using Python.
While there are many plotting libraries to choose from, the prolific Matplotlib library is always a great place to start. Since various Python data science libraries utilize Matplotlib under the hood, familiarity with Matplotlib itself gives you the flexibility to fine tune the resulting visualizations (e.g., add annotations, animate, etc.). This session will also introduce interactive visualizations using HoloViz, which provides a higher-level plotting API capable of using Matplotlib and Bokeh (a Python library for generating interactive, JavaScript-powered visualizations) under the hood.
Can you suggest 2-3 topics for post-discussion?
Anything relating to the content covered, building data tools, or writing a book/creating workshops
Co-Founder & CEO, Private AI
Patricia Thaine is the Co-Founder & CEO of Private AI, a Microsoft-backed startup, is also a Computer Science PhD Candidate at the University of Toronto (on leave) and a Vector Institute alumna. Her R&D work is focused on privacy-preserving natural language processing, with a focus on applied cryptography and re-identification risk. She also does research on computational methods for lost language decipherment. Patricia is a recipient of the NSERC Postgraduate Scholarship, the RBC Graduate Fellowship, the Beatrice “Trixie” Worsley Graduate Scholarship in Computer Science, and the Ontario Graduate Scholarship. She has ten years of research and software development experience, including at the McGill Language Development Lab, the University of Toronto’s Computational Linguistics Lab, the University of Toronto’s Department of Linguistics, and the Public Health Agency of Canada.
Workshop: Demystifying De-Identification
Abstract: Workshop with discussion and demo. The session will begin with an overview of privacy enhancing technologies and then dive into de-identification terminology (de-identification, anonymization, redaction, pseudonymization), how these have been misunderstood, and what to think about when choosing between one of these and other privacy enhancing technologies.
The attendees should bring a sample dataset (preferably made up of unstructured text) and a use case in mind. Each attendee will receive an API key to process a data sample and we will discuss the results. Data can be in languages other than English. Please confirm with organizer that the language is supported first.
What You’ll Learn: Attendees will learn about which privacy enhancing technologies are best for their use case and understand when de-identification is right for them and how not to misuse terminology such as “anonymization”
Technical Level: 4
Location: Toronto
Presenter:
Patricia Thaine, Co-Founder & CEO, Private AI
About the Speaker:
Patricia Thaine is the Co-Founder & CEO of Private AI, a Microsoft-backed startup, is also a Computer Science PhD Candidate at the University of Toronto (on leave) and a Vector Institute alumna. Her R&D work is focused on privacy-preserving natural language processing, with a focus on applied cryptography and re-identification risk. She also does research on computational methods for lost language decipherment. Patricia is a recipient of the NSERC Postgraduate Scholarship, the RBC Graduate Fellowship, the Beatrice “Trixie” Worsley Graduate Scholarship in Computer Science, and the Ontario Graduate Scholarship. She has ten years of research and software development experience, including at the McGill Language Development Lab, the University of Toronto’s Computational Linguistics Lab, the University of Toronto’s Department of Linguistics, and the Public Health Agency of Canada.
Technical level of your talk?
(Technical Level: 4/7)
What you’ll learn:
Attendees will learn about which privacy enhancing technologies are best for their use case and understand when de-identification is right for them and how not to misuse terminology such as “anonymization”
Abstract of Talk:
Workshop with discussion and demo. The session will begin with an overview of privacy enhancing technologies and then dive into de-identification terminology (de-identification, anonymization, redaction, pseudonymization), how these have been misunderstood, and what to think about when choosing between one of these and other privacy enhancing technologies.
The attendees should bring a sample dataset (preferably made up of unstructured text) and a use case in mind. Each attendee will receive an API key to process a data sample and we will discuss the results. Data can be in languages other than English. Please confirm with organizer that the language is supported first.
ML Lead, Voiceflow
Started the ML team at Voiceflow, Kickstarted Rbc’s MLOps journey, Youngest Senior Architect at Rbc. Lead discussion groups and mentorship on MLOps and various blog posts
Workshop: Iterating on NLP Models from R&D to Production
Abstract: Research papers, blogs and products are the culmination of many hours of work, iteration and frustration. However, in these final polished formats, we often gloss over the iteration or creative process on how to get to our desired results.
In this talk, I’ll cover a series of short labs that mirror some of the challenges we’ve faced in building out our NLP models and algorithms. It will be an interactive session with a series of collaborative problem solving, and explanations of what we built and the process we took along the way.
Some of the twists and turns will include:
– Integrating a BERT based model with a message queue system
– Speeding up semantic search through vectorization
– Enabling multi lingual recommendations
Each member of the talk will have access to the code examples and will be encouraged to think beyond the challenges addressed and how they can apply some of our lessons learned to their own work.
What You’ll Learn:
– How to go from idea to product
– How to iterate on a product
– How to go to production
– How to incorporate customer feedback
Technical Level: 6
Location: Toronto
Presenter:
Denys Linkov, ML Lead, Voiceflow
About the Speaker:
Started the ML team at Voiceflow, Kickstarted Rbc’s MLOps journey, Youngest Senior Architect at Rbc. Lead discussion groups and mentorship on MLOps and various blog posts
Technical level of your talk?
(Technical level: 6/7)
Are there any industries (in particular) that are relevant for this talk?
Computer Software, Information Technology & Service, Any startup / large company looking at the R&D process
Who is this presentation for?
Data Scientists/ ML Engineers, ML Engineers, Researchers
What you’ll learn:
Specific examples and challenges of building NLP products
What are the main core message (learning) you want attendees to take away from this talk?
– How to go from idea to product
– How to iterate on a product
– How to go to production
– How to incorporate customer feedback
What is unique about this speech, from other speeches given on the topic?
Nothing super unique, but many startups of this size rarely share their experience building products and iterating. Many tutorials cover basics and not real business problems.
Abstract of Talk:
Research papers, blogs and products are the culmination of many hours of work, iteration and frustration. However, in these final polished formats, we often gloss over the iteration or creative process on how to get to our desired results.
In this talk, I’ll cover a series of short labs that mirror some of the challenges we’ve faced in building out our NLP models and algorithms. It will be an interactive session with a series of collaborative problem solving, and explanations of what we built and the process we took along the way.
Some of the twists and turns will include:
– Integrating a BERT based model with a message queue system
– Speeding up semantic search through vectorization
– Enabling multi lingual recommendations
Each member of the talk will have access to the code examples and will be encouraged to think beyond the challenges addressed and how they can apply some of our lessons learned to their own work.
Can you suggest 2-3 topics for post-discussion?
BERT based models and embeddings
Deploying models into production
NLP product development
Postdoctoral Fellow, University of Toronto / Machine Learning Researcher, Cyclica
Nasim is a Postdoctoral Fellow at University of Toronto and a Machine Learning Researcher Intern at Cyclica, leading a collaborative project between Cyclica, University of Toronto and Vector Institute. She is the vice-chair of Engineering in Medicine and Biology Society of IEEE Toronto section. Nasim obtained her Ph.D. in electrical and computer engineering from University of Manitoba and has M.Sc. and B.Sc. in biomedical engineering. With her passion for developing and applying novel machine learning techniques for improving the quality of health care, she has conducted numerous research projects on enhancing biomedical imaging for breast cancer detection and monitoring. Her current research is focused on graph-based machine learning models that can predict proteins’ biological functions from their 3D atomic structures, with a promise to enhance designing novel medicines. Nasim is an advocate for women in STEM, serves as vice-chair of IEEE Canada Women in Engineering, and was recognized as a “Visionary Emerging Leader”.
Co-Presenter: Dr. Farnoosh Khodakarami
Workshop: Graph Neural Network Modeling in Drug Discovery Using PyTorch
Abstract: Graph Neural Networks (GNNs) have been among the most popular neural network architectures, and as graph is a natural representation for protein and molecule, GNNs have shown big sparks in graph-based ML modeling for drug discovery and protein science. Graph-based ML models can help us in identifying the topology of a protein structure from protein sequence, predicting protein’s biological functions from protein structure as well as identifying protein-protein and protein-drug interactions. In this workshop, we will have an introduction on Graph Neural Network (GNN) and its application in drug discovery followed by a code session on PyTorch Geometric, which is a great PyTorch library for building GNN models for structured data. We will then have a code-base session to walk you through two useful tools built with PyTorch Geometric: TorchDrug and NodeCoder.
What You’ll Learn: Audience will learn about:
– Graph Neural Network (GNN) in drug discovery
– How to build GNN with PyTorch Geometric
– TorchDrug – ML platform for drug discovery
– TorchProtein – a ML library for protein science
– NodeCoder – a graph-based ML framework for predicting proteins’ biological functions
Technical Level: 7
Location: Toronto
Presenters:
Dr. Nasim Abdollahi, Postdoctoral Fellow at University of Toronto, Machine Learning Researcher at Cyclica & Dr. Farnoosh Khodakarami Computer Scientist & ML Researcher, Cyclica
About the Speaker:
Nasim is a Postdoctoral Fellow at University of Toronto and a Machine Learning Researcher Intern at Cyclica, leading a collaborative project between Cyclica, University of Toronto and Vector Institute. She is the vice-chair of Engineering in Medicine and Biology Society of IEEE Toronto section. Nasim obtained her Ph.D. in electrical and computer engineering from University of Manitoba and has M.Sc. and B.Sc. in biomedical engineering. With her passion for developing and applying novel machine learning techniques for improving the quality of health care, she has conducted numerous research projects on enhancing biomedical imaging for breast cancer detection and monitoring. Her current research is focused on graph-based machine learning models that can predict proteins’ biological functions from their 3D atomic structures, with a promise to enhance designing novel medicines. Nasim is an advocate for women in STEM, serves as vice-chair of IEEE Canada Women in Engineering, and was recognized as a “Visionary Emerging Leader”.
Farnoosh Khodakarami is an experienced computer scientist with a demonstrated history of working in the research industry. Skilled in application development with experience in machine learning applications. Strong research professional with a Doctor of Philosophy (Ph.D.) focused in Computer Science. Creative, self-motivated, and committed to working with a team-player attitude, great problem-solving skills, and the ability to quickly grasp new concepts.
Which talk track does this best fit into?
Workshop
Technical level of your talk?
(Technical level: 7/7)
Are there any industries (in particular) that are relevant for this talk?
Hospital & Health Care
What are the main core message (learning) you want attendees to take away from this talk?
Audience will learn about:
– Graph Neural Network (GNN) in drug discovery
– How to build GNN with PyTorch Geometric
– TorchDrug – ML platform for drug discovery
– TorchProtein – a ML library for protein science
– NodeCoder – a graph-based ML framework for predicting proteins’ biological functions
Abstract of Talk:
Graph Neural Networks (GNNs) have been among the most popular neural network architectures, and as graph is a natural representation for protein and molecule, GNNs have shown big sparks in graph-based ML modeling for drug discovery and protein science. Graph-based ML models can help us in identifying the topology of a protein structure from protein sequence, predicting protein’s biological functions from protein structure as well as identifying protein-protein and protein-drug interactions. In this workshop, we will have an introduction on Graph Neural Network (GNN) and its application in drug discovery followed by a code session on PyTorch Geometric, which is a great PyTorch library for building GNN models for structured data. We will then have a code-base session to walk you through two useful tools built with PyTorch Geometric: TorchDrug and NodeCoder.
Senior Data Scientist Specialist Solution Architect, RedHat Canada
Arthur is a senior data scientist specialist solution architect at RedHat Canada where with the help of open source software is helping organizations develop intelligent application ecosystems and bring them into production using MLOps best practices.
He is also pursuing his Ph.D. degree in Computer Science at Concordia University, Montreal, Canada, and he is a research assistant in the Software Perfomance Analysis and Reliability (SPEAR) Lab.
His research interests are related to AIOps with focus on performance and scalability optimization.
Workshop: Open Source Intelligent Application Delivery on Kubernetes
Abstract: The recent rise in popularity of containerized workloads demanded better ways to orchestrate and manage these workloads hence the creation of the Kubernetes platform.
When it comes to running intelligent application workloads which contain built-in AI/ML software components, the requirement of a Kubernetes platform as a service extends beyond agility, portability, flexibility and scalability as it is required to also answer to the datascientist’s dilemma: getting started and getting into production.
However, as the ML code is only a small part of the entire intelligent application ecosystem, with this workshop we present a showcase for using a Kubernetes platform and a blueprint architecture that proposes an answer to many challenges related to the development, deployment and management of distributed applications.
The user stories we shall focus on in this workshop concerning the developer, data scientist and operations engineer personas are:
– As a data scientist, I want to develop ML models using Jupyter Hub (lab/notebooks) as my preferred research environment.
– As a data scientist, I want my model to be deployed quickly so that it may be used by other applications.
– As a (fullstack) developer, I want to have quick access to resources that support the business logic of my applications, including databases, storage, messaging.
– As a (fullstack) developer, I want an automated build process to support new releases/code updates as soon as they are available in a git repository.
– As an operations engineer, I want an integrated monitoring dashboard to new applications available on the (production) infrastructure.
What You’ll Learn: Open source container platforms are a great option to integrate Machine Learning with any application or service by boosting productivity while maintaining a high level of security.
Technical Level: 4
Location: Bossard
Presenter:
Arthur Vitui, Senior Data, Scientist Specialist Solution Architect, RedHat Canada
About the Speaker
Arthur is a senior data scientist specialist solution architect at RedHat Canada where with the help of open source software is helping organizations develop intelligent application ecosystems and bring them into production using MLOps best practices.
He is also pursuing his Ph.D. degree in Computer Science at Concordia University, Montreal, Canada, and he is a research assistant in the Software Perfomance Analysis and Reliability (SPEAR) Lab.
His research interests are related to AIOps with focus on performance and scalability optimization.
Which talk track does this best fit into?
Workshop
Technical level of your talk?
(Technical level: 4/7)
Are there any industries (in particular) that are relevant for this talk?
Computer Software, Hospital & Health Care, Information Technology & Service, Insurance
Who is this presentation for?
Senior Business Executives, Product Managers, Data Scientists/ ML Engineers and High-level Researchers, Product Managers, Data Scientists/ ML Engineers, ML Engineers, Researchers
What you’ll learn:
The audience will learn about what an intelligent application is, how to orchestrate its design, deployment and monitoring in a kubernetes environment. The audience will also learn about the datascientist dillema and how it may be addressed.
What are the main core message (learning) you want attendees to take away from this talk?
Open source container platforms are a great option to integrate Machine Learning with any application or service by boosting productivity while maintaining a high level of security.
Pre-requisite Knowledge:
Generic SDLC and basic Kubernetes knowledge
What is unique about this speech, from other speeches given on the topic?
Bringing in an enterprise perspective and enterprise ready Kubernetes platform beyond just a proof of concept (POC) although presenting a POC showcase for an end to end intelligent application.
Abstract of Talk:
The recent rise in popularity of containerized workloads demanded better ways to orchestrate and manage these workloads hence the creation of the Kubernetes platform.
When it comes to running intelligent application workloads which contain built-in AI/ML software components, the requirement of a Kubernetes platform as a service extends beyond agility, portability, flexibility and scalability as it is required to also answer to the datascientist’s dilemma: getting started and getting into production.
However, as the ML code is only a small part of the entire intelligent application ecosystem, with this workshop we present a showcase for using a Kubernetes platform and a blueprint architecture that proposes an answer to many challenges related to the development, deployment and management of distributed applications.
The user stories we shall focus on in this workshop concerning the developer, data scientist and operations engineer personas are:
– As a data scientist, I want to develop ML models using Jupyter Hub (lab/notebooks) as my preferred research environment.
– As a data scientist, I want my model to be deployed quickly so that it may be used by other applications.
– As a (fullstack) developer, I want to have quick access to resources that support the business logic of my applications, including databases, storage, messaging.
– As a (fullstack) developer, I want an automated build process to support new releases/code updates as soon as they are available in a git repository.
– As an operations engineer, I want an integrated monitoring dashboard to new applications available on the (production) infrastructure.
Can you suggest 2-3 topics for post-discussion?
– DataScientist Kubernetes Platform as a Service
– Automating builds and exposure of ML models inference endpoints
CTO, ArangoDB
Jörg Schad is CTO at ArangoDB and enjoys working in intersections of (Graph) databases, Cloud Architectures, and Machine Learning. In a previous life, he has worked on or built machine learning pipelines in healthcare, distributed systems at Mesosphere, and in-memory databases. He received his Ph.D. for research around distributed databases and data analytics. He’s a frequent speaker at meetups, international conferences, and lecture halls.
Workshop: Graph ML – The Next Level of Machine Learning
Abstract: This workshop focuses on why Graphs have become one of the biggest trends in Machine Learning. Graph Machine Learning based on Graph Analytic Algorithms is driving significant improvements in Fraud/Anomaly Detection, Ranking (Page Rank), Recommendation Engines (collaborative filtering), text summarization, and other NLP tasks. We will cover Graph Analytic Algorithms, their applications, and the more novel–but equally exciting–field of Graph Machine Learning, including topics such as Graph Neural Networks, Graph Embeddings, and applications of Graph Machine Learning.
The workshop will be hands-on based on Jupyter notebooks and cover sessions
– Why Graph and Graph Thinking
– Graph Algorithms
– Graph Embeddings
– Graph Neural Networks
What You’ll Learn: Graph Machine is considering relations (and neighborhood context) as first class citizens and hence can lead to more powerful and simplified Machine learning models.
Technical Level: 6
Location: Berlin/San Francisco
Presenter:
Jörg Schad, CTO, ArangoDB
About the Speaker:
Jörg Schad is CTO at ArangoDB and enjoys working in intersections of (Graph) databases, Cloud Architectures, and Machine Learning. In a previous life, he has worked on or built machine learning pipelines in healthcare, distributed systems at Mesosphere, and in-memory databases. He received his Ph.D. for research around distributed databases and data analytics. He’s a frequent speaker at meetups, international conferences, and lecture halls.
Which talk track does this best fit into?
Workshop
Technical level of your talk?
(Technical level: 5/7)
Are there any industries (in particular) that are relevant for this talk?
All industries
What are the main core message (learning) you want attendees to take away from this talk?
Machine learning is much more than just building models and the overall pipeline should be considered early on in order to result in actual business impact. Luckily there exist a number of Open-Source projects to help…
Abstract of Talk:
This workshop focuses on why Graphs have become one of the biggest trends in Machine Learning. Graph Machine Learning based on Graph Analytic Algorithms is driving significant improvements in Fraud/Anomaly Detection, Ranking (Page Rank), Recommendation Engines (collaborative filtering), text summarization, and other NLP tasks. We will cover Graph Analytic Algorithms, their applications, and the more novel–but equally exciting–field of Graph Machine Learning, including topics such as Graph Neural Networks, Graph Embeddings, and applications of Graph Machine Learning.
The workshop will be hands-on based on Jupyter notebooks and cover sessions
– Why Graph and Graph Thinking
– Graph Algorithms
– Graph Embeddings
– Graph Neural Networks
Presenter:
Jörg Schad, CTO, ArangoDB
About the Speaker:
Jörg Schad is CTO at ArangoDB and enjoys working in intersections of (Graph) databases, Cloud Architectures, and Machine Learning. In a previous life, he has worked on or built machine learning pipelines in healthcare, distributed systems at Mesosphere, and in-memory databases. He received his Ph.D. for research around distributed databases and data analytics. He’s a frequent speaker at meetups, international conferences, and lecture halls.
Which talk track does this best fit into?
Workshop
Technical level of your talk?
(Technical level: 5/7)
Are there any industries (in particular) that are relevant for this talk?
All industries
What are the main core message (learning) you want attendees to take away from this talk?
Machine learning is much more than just building models and the overall pipeline should be considered early on in order to result in actual business impact. Luckily there exist a number of Open-Source projects to help…
Abstract of Talk:
Many Machine Learning fail to turn an initial idea and potentially even first model into the business impact, as they neglect the importance (and associated work) of building a production-grade ML pipeline. There are many great tutorials for training your deep learning models using PyTorch, TensorFlow, Keras, Spark or one of the many other frameworks. But training is only a small part in the overall deep learning pipeline. This workshop gives an overview into building a complete automated deep learning pipeline starting with exploratory analysis, over training, model storage, model serving, meta-data storage, and monitoring using available Open-Source tool.
The participants will build an end-to-end data analytics pipeline including:
– Pipeline Orchestration
– Data preparation using Apache Spark
– Jupyter Notebooks
– Distributed training with TensorFlow
– Automation & CI/CD using Jenkins and Argo
– Model and metadata storage
– Model serving and monitoring
Lead Data Scientist, TELUS Business Marketing
With deep expertise in Machine Learning and AI, Mahmudul has over 10 years industry experience of building enterprise level data products to achieve digital transformation, improve customer experience, new revenue opportunity, and cost savings for companies across the globe. He is currently serving as a Lead Data Scientist in TELUS Business Marketing. Mahmudul also designed and developed NLP course content for University of Toronto School of Continuing Studies and also serving as an instructor for the same.
Mahmudul holds a Master’s degree in Management Science from University of Waterloo and a Bachelor’s in Computer Science & Engineering.
Workshop: Introduction to NLP & a Step by Step Implementation of a Real World Use Case from TELUS
Abstract: The workshop will be delivered in two part:
– Part-1: Brief introduction to NLP concepts and ideas which would include
– Basic definitions and use cases
– Why NLP is a different ball game inside AI/ML (major challenges of processing natural language etc.)
– How those challenges are overcame with ML based approach
– Major workflow of building NLP application.
– Part-2: is a detail implementation of a case study with coding details which I have implemented in TELUS. During this part-2, audience will see how a business problem is solved leveraging unstructured text data using NLP algorithms along with necessary tips and tricks which makes a unsupervised learning based project financially successful for the company.
What You’ll Learn: Audience will see how a business problem is solved leveraging unstructured text data using NLP algorithms along with necessary tips and tricks which makes a unsupervised learning based project financially beneficial for the business.
Technical Level: 6
Location: Toronto
Presenter:
Mahmudul Hasan, Lead Data Scientist, TELUS Business Marketing
About the Speaker:
With deep expertise in Machine Learning and AI, Mahmudul has over 10 years industry experience of building enterprise level data products to achieve digital transformation, improve customer experience, new revenue opportunity, and cost savings for companies across the globe. He is currently serving as a Lead Data Scientist in TELUS Business Marketing. Mahmudul also designed and developed NLP course content for University of Toronto School of Continuing Studies and also serving as an instructor for the same.
Mahmudul holds a Master’s degree in Management Science from University of Waterloo and a Bachelor’s in Computer Science & Engineering.
Which talk track does this best fit into?
Workshop
Technical level of your talk?
(Technical level: 6/7)
Are there any industries (in particular) that are relevant for this talk?
Computer Software, Marketing & Advertising, Telecommunications
Who is this presentation for?
Senior Business Executives, Product Managers, Data Scientists/ ML Engineers and High-level Researchers, Data Scientists/ ML Engineers
What you’ll learn:
The audience will have a real world case study of how unsupervised NLP algorithm can be successfully create values for a business, and some tips and tricks which make this kind of project successful for a data scientist
What are the main core message (learning) you want attendees to take away from this talk?
Audience will see how a business problem is solved leveraging unstructured text data using NLP algorithms along with necessary tips and tricks which makes a unsupervised learning based project financially beneficial for the business.
Pre-requisite Knowledge:
Some basic understanding on Data Science
What is unique about this speech, from other speeches given on the topic?
The audience will get an idea of how unstructured data can be converted to generate financially impactful benefits for business. Also will share some tips on how to make this kind of unsupervised learning based project a successful for a big corporation like TELUS.
Abstract of Talk:
The workshop will be delivered in two part:
Part-1: Brief introduction to NLP concepts and ideas which would include
– Basic definitions and use cases
– Why NLP is a different ball game inside AI/ML (major challenges of processing natural language etc.)
– How those challenges are overcame with ML based approach
– Major workflow of building NLP application.
Part-2: is a detail implementation of a case study with coding details which I have implemented in TELUS. During this part-2, audience will see how a business problem is solved leveraging unstructured text data using NLP algorithms along with necessary tips and tricks which makes a unsupervised learning based project financially successful for the company.
Can you suggest 2-3 topics for post-discussion?
1. What are the challenges of implementing a data science project in business?
2. how can you make your AIML project impactful for the business?
Presenters:
Akbar Nurlybayev, Co-Founder/VP of Engineering, CentML & Xin Li, Research Engineer, CentML & Yubo Gao, Research Engineer, CentML
About the Speakers:
Akbar is the Co-founder and VP of Engineering at CentML. Previously, Director of Data at KAR Global, $2 Billion publicly traded company. Yubo and Xin: PhD students at UofT Efficient Computing Lab
Xin Li is a former member of AI Technical Staff at Vector Institute’s AI Engineering team. Working within the vibrant community at Vector Institute, Xin collaborates with Vector researchers and industry partners to make Deep Learning research more accessible in applied settings. Currently, he is working as a Research Engineer at CentML
Yubo Gao has recently completed his undergraduate degree at the University of Toronto and have joined the EcoSystem lab as a PhD student, during which he is fortunate to be supervised by Prof. Gennady Pekhimenko.
Which talk track does this best fit into?
Workshop
Technical level of your talk?
(Technical level: 5/7)
What you’ll learn:
This workshop provides unique opportunities for attendees to learn and perform system-level optimizations for deep learning. This topic is often overlooked among ML practitioners due to time and resource constraints. This workshop strives to provide practitioners with some easy-to-use and practical tools to help them understand and optimize their workloads. This workshop also brings a unique perspective on the importance of hardware efficiency when working with Deep Learning models.
Abstract of Talk:
Everybody nowadays train models. Every year the size of the state-of-the-art models grows faster than the hardware becomes cheaper. We observed that many organizations significantly underutilize the available hardware accelerators, i.e. Nvidia GPUs, and as a result, are overpaying for both ML training and inference. In this workshop, our team of world-class ML Systems researchers will share various techniques and tools we use to profile and optimize deep learning models. We will demonstrate how the insights learned from the profiling can be used to discover optimization opportunities that make deep learning models utilize hardware more efficiently. This results in reduced training time, model iteration speed and ultimately lower cost for organizations.
Presenter:
Shagun Sodhani, Research Engineer, Meta AI
About the Speaker:
Research Engineer at Meta AI, previously at Mila and Adobe Research
Which talk track does this best fit into?
Workshop
Technical level of your talk?
(Technical level: 5/7)
What you’ll learn:
By the end of the session, the attendees would be able to take a simple PyTorch model and scale it to work with dozens of machines. For the straightforward use cases, this will require writing just a few lines of code.
Abstract of Talk:
PyTorch is one of the most popular ML frameworks with the recent releases focusing on enhanced support for distributed training. This talk discusses the different distributed training mechanisms provided by PyTorch. It should be helpful for both practitioners & researchers who want to train larger models and faster.
Presenter:
Eric Hammel, MLOps Engineer, Rocket Science Development
About the Speaker:
A resourceful professional able to bridge skills between Data Science and Infrastructure (Cloud and HPC) to deliver valuable solutions. With experience in prototyping, deploying, and monitoring distributed workloads to drive an organization in translating real-life business problems into scalable data science solutions to generate value.
Which talk track does this best fit into?
Workshop
Technical level of your talk?
(Technical level: 5/7)
What you’ll learn:
The participants will get a crash course about Kubernetes and Cloud Native concepts. They will learn how to deploy an application on a managed kubernetes cluster.
Abstract of Talk:
Have you ever wondered what kubernetes and Cloud Native applications are?
Here is the perfect opportunity to get exposed to these complex yet powerful tools & conecepts.
You will discover Container Orchestration, Cloud Native applications, Kubernetes, and application deployment.
Presenters: Benjamin Ye, Applied Research Scientists, Georgian & Angeline Yasodhara, Applied Research Scientists, Georgian
About the Speaker:
Applied Research Scientists
Which talk track does this best fit into?
Workshop
Technical level of your talk?
(Technical level: 5/7)
What you’ll learn:
Time series anomaly detection methods and applications
Abstract of Talk:
Traditional methods in time series anomaly detection yield good results for relatively simple tasks, but they often fall short when it comes to harder problems of dealing with long-range dependencies, multivariate time series, and subtle contextual anomalies. We introduce a toolkit incorporating classical and novel machine learning techniques (N-BEATS, Transformers, etc.) as well as recent thresholding methods to overcome these challenges.
We will discuss their benchmark results against different anomaly types for both univariate and multivariate cases. We will walk through how you can use this simple toolkit and easily incorporate these techniques into your application.
Presenters: Eric Huang, Founder & CEO, Advanced Analytics and Research Lab & Michael Woolfson, Client Lead & Development, Advanced Analytics and Research Lab
About the Speaker:
Eric Huang is the Founder and CEO of Advanced Analytics and Research Lab (AAARL.CA), a data science, analytic and AI services and solutions firm. The company helps organizations to fully streamline and utilize data to increase productivity, improve insights and ultimately achieve their goals. Eric has an undergraduate degree in Honors Business Administration, a Master of Science in Analytics from Ivey Business School, as well as an Honors Specialization in Economics from Western University. He has worked in various capacities in consulting, business development, finance, and academia, and has experiences teaching undergraduate and master level students in fun, engaging, and practical ways. Eric is a fun and friendly individual who loves to learn about everything in the world, and is also an avid coffee drinker, barista, photographer and volunteer.
Which talk track does this best fit into?
Workshop
Technical level of your talk?
(Technical level: 3/7)
What you’ll learn:
There are areas of blind spots where business and analytics meet, we will go through some common ones and how to resolve them. As well, for organizations new to analytics, some know how and some confidence into starting new analytics initiatives.
Abstract of Talk:
This talk will go through practical initiative to super charge your existing data and analytics strategies. As well, for those just starting out, frameworks around how to start an data and analytics function. We will go through the following topics: an introduction to areas of application in data and analytics for industry, establishing data and analytics strategies, setting up data and performance tracking, establishing key performance indicators, establishing a data drive culture.
Presenter:
Dan Adamson, CEO and Co-Founder, Armilla AI
About the Speaker:
Dan Adamson is the Co-Founder and CEO of Armilla.AI, a company helping institutions create trust in their AI. He co-founded PointChain Technologies, an AI-based neo-banking platform for high-risk industries and was Founder/CEO of OutsideIQ until its acquisition by Exiger, where he remained as their President overseeing product and cognitive computing research. OutsideIQ deployed AML and anti-fraud models to over 100 global financial institutions and built AI solutions for the HR and Insurance industries. He also previously served as Chief Architect at Medstory, a vertical search start-up acquired by Microsoft. Adamson holds several search algorithm and cognitive computing patents, and holds a Master of Science degree from U.C. Berkeley.
Which talk track does this best fit into?
Workshop
Technical level of your talk?
(Technical level: 3/7)
Abstract of Talk:
Testing for fairness in AI HR systems: hidden dangers and real-world lessons on how to detect and prevent biasAbstract: HR systems can perpetuate biases and represent a significant risk to organizations and harms to candidates. In this tutorial, we will review how to detect bias issues in HR systems, including resume screening and promotion models with Armilla, a QA for ML tool that is being used for formal assessments, including those under the new New York City bias law. We’ll look at hidden biases and common motifs that can cause these systems to fail, as well as suggestions for making these systems more robust.
Senior Data Scientist, BlackRock
Bhaskarjit is a data scientist and has solved business problems in many domains including Retail, FMCG, Banking, Media & Entertainment etc. using machine learning. Currently he is working as a data scientist BlackRock where he builds predictive models for financial markets. His research interests are Network Science, AI Interpretability, Uncertainty, NLP etc.
Workshop: Learning Embedded Representation of the Stock Correlation Matrix using Graph Machine Learning
Abstract: Understanding non-linear relationships among financial instruments has various applications in investment processes ranging from risk management, portfolio construction and trading strategies. Here, we focus on interconnectedness among stocks based on their correlation matrix which we represent as a network with the nodes representing individual stocks and the weighted links between pairs of nodes representing the corresponding pair-wise correlation coefficients. The traditional network science techniques, which are extensively utilized in financial literature, require handcrafted features such as centrality measures to understand such correlation networks.
However, manually enlisting all such handcrafted features may quickly turn out to be a daunting task. Instead, we propose a new approach for studying nuances and relationships within the correlation network in an algorithmic way using a graph machine learning algorithm called Node2Vec.
In particular, the algorithm compresses the network into a lower dimensional continuous space, called an embedding, where pairs of nodes that are identified as similar by the algorithm are placed closer to each other. By using log returns of S&P 500 stock data, we show that our proposed algorithm can learn such an embedding from its correlation network. We define various domain specific quantitative (and objective) and qualitative metrics that are inspired by metrics used in the field of Natural Language Processing (NLP) to evaluate the embeddings in order to identify the optimal one. Further, we discuss various applications of the embeddings in investment management.
What You’ll Learn: In this paper we have shown how to create stock embedding representation from stock correlation matrix. And evaluated the learnt embeddings using a quantitative way
Pre-requiste Knowledge: Network Science, Machine Learning, Word Embeddings
Technical Level: 5
Location: Delhi
Presenter:
Bhaskarjit Sarmah, Senior Data Scientist, BlackRock
About the Speaker:
Bhaskarjit is a data scientist and has solved business problems in many domains including Retail, FMCG, Banking, Media & Entertainment etc. using machine learning. Currently he is working as a data scientist BlackRock where he builds predictive models for financial markets. His research interests are Network Science, AI Interpretability, Uncertainty, NLP etc.
Which talk track does this best fit into?
Research: Advanced Technical.
Technical level of your talk?
(Technical level: 4 /7)
Are there any industries (in particular) that are relevant for this talk?
Banking & Financial Services, Information Technology & Service, Insurance, Marketing & Advertising
Who is this presentation for?
Senior Business Executives, Product Managers, Data Scientists/ ML Engineers and High-level Researchers, Product Managers, Data Scientists/ ML Engineers, ML Engineers, Researchers
What you’ll learn:
In this paper we have shown how to create stock embedding representation from stock correlation matrix. And evaluated the learnt embeddings using a quantitative way.
What are the main core message (learning) you want attendees to take away from this talk?
How to represent financial securities in form of embeddings using graph machine learning
Pre-requisite Knowledge:
Network Science, Machine Learning, Word Embeddings
What is unique about this speech, from other speeches given on the topic?
This speech is centered around feature extraction from networks. In this speech, will first introduce the traditional hand crafted feature extraction technique from networks. And then will explain how we can use graph machine learning for automatic feature extraction in the form embeddings. And how to evaluate those embeddings in quantitative way.
Abstract of Talk:
Understanding non-linear relationships among financial instruments has various applications in investment processes ranging from risk management, portfolio construction and trading strategies. Here, we focus on interconnectedness among stocks based on their correlation matrix which we represent as a network with the nodes representing individual stocks and the weighted links between pairs of nodes representing the corresponding pair-wise correlation coefficients. The traditional network science techniques, which are extensively utilized in financial literature, require handcrafted features such as centrality measures to understand such correlation networks. However, manually enlisting all such handcrafted features may quickly turn out to be a daunting task. Instead, we propose a new approach for studying nuances and relationships within the correlation network in an algorithmic way using a graph machine learning algorithm called Node2Vec. In particular, the algorithm compresses the network into a lower dimensional continuous space, called an embedding, where pairs of nodes that are identified as similar by the algorithm are placed closer to each other. By using log returns of S&P 500 stock data, we show that our proposed algorithm can learn such an embedding from its correlation network. We define various domain specific quantitative (and objective) and qualitative metrics that are inspired by metrics used in the field of Natural Language Processing (NLP) to evaluate the embeddings
Can you suggest 2-3 topics for post-discussion?
Node2Vec, Stock Embeddings, Network Science
Presenters:
Danny Chiao, Tech Lead, Feast & Eddie Esquivel, Sr. Solutions Architect, Tecton & Abhin Chhabra, ML Platform Tech Lead, Shopify
About the Speakers:
Danny Chiao is an engineering lead at Tecton/Feast Inc working on building a next-generation feature store. Previously, Danny was a technical lead at Google working on end to end machine learning problems within Google Workspace, helping build privacy-aware ML platforms / data pipelines and working with research and product teams to deliver large-scale ML powered enterprise functionality. Danny holds a Bachelor’s degree in Computer Science from MIT. |
Eddie Esquivel is a Solutions Architect at Tecton, where he helps customers implement feature stores as part of their stack for Operational ML. Prior to Tecton, Eddie was a Solutions Architect at AWS.
Abhin leads the feature store team for Shopify’s ML Platform.
Which talk track does this best fit into?
Workshop
Technical level of your talk?
(Technical level: 4/7)
Who is this presentation for?
Product Managers, Data Scientists/ ML Engineers, ML Engineers
What you’ll learn:
You will learn how to:
– Build new features
– Automate the transformation of batch data
– Automate the transformation of streaming and real-time data
– Create training datasets
– Serve data online using DynamoDB or Redis
– Build fraud detection system using Tecton and Feast
Pre-requisite Knowledge:
Attendees should have functional knowledge of Python, SQL and Spark, as well as familiarity with the challenges of data engineering for ML.
What is unique about this speech, from other speeches given on the topic?
Danny and Eddie are core members of the Feast and Tecton Engineering and Solutions Architect teams. They have deep expertise in working with dozens of end-users to build real-time recommendation systems using feature stores. They also have a lot of experience working on ML infrastructure at Google, AWS, and Tecton.
Abstract of Talk:
In this workshop, we’ll show how to build a real-time fraud detection system using some of the latest tooling for managing ML data pipelines. We’ll walk through the process of building, deploying, and serving real-time data pipelines, highlighting the differences between a traditional feature store (using Feast, the open source feature store) and a feature platform (using Tecton).
We’ll present common architectural patterns and walk you through building a model in three stages:
– Batch, daily computed predictions
– Online predictions using batch features
– Online predictions using real-time features
Can you suggest 2-3 topics for post-discussion?
– Best practices for ML recommendation systems
– Building streaming and real-time data pipelines for ML
– Feature Stores: have you implemented one? Let’s share learnings
Assistant Professor, University of Toronto
Annie En-Shiun Lee is an Assistant Professor (Teaching Stream) for the Computer Science Department at the University of Toronto. She received her PhD from the University of Waterloo in 2014 under the supervision of Professor Andrew K. C. Wong and Daniel Stashuk from the Centre of Pattern Intelligence and Machine Intelligence. She has also been a visiting researcher at the Fields Institute (invited by Nancy Reid) and CUHK (invited by K. S. Leung and M. H. Wong) as well as a research scientist at VerticalScope and Stradigi AI.
Workshop: Pre-Trained Multilingual Sequence-to-Sequence Models for NMT: Tips, Tricks and Challenges
Abstract: Neural Machine Translation (NMT) has seen a tremendous spurt of growth in less than ten years, and has already entered a mature phase. Pre-trained multilingual sequence-to-sequence (PMSS) models, such as mBART and mT5, are pre-trained on large general data, then fine-tuned to deliver impressive results for natural language inference, question answering, text simplification and neural machine translation. This tutorial presents 1) An Introduction to Sequence-to-Sequence Pre-trained Models, 2) How to adapt pre-trained models for NMT, 3) Tips and Tricks for NMT training and evaluation, 4) Challenges/Problems faced when using these models. This tutorial will be useful for those interested in NMT, from a research as well as industry point of view.
What You’ll Learn: This tutorial will give an overview of Pre-trained Sequence-to-Sequence Multilingual Models, tips, tricks and frameworks that can be used to adapt these models for NMT especially for low resource languages and the challenges faced while using these models and how to overcome them.
Technical Level: 5
Location: Toronto
Presenter:
Annie En-Shiun Lee, Assistant Professor, University of Toronto
About the Speaker:
Annie En-Shiun Lee is an Assistant Professor (Teaching Stream) for the Computer Science Department at the University of Toronto. She received her PhD from the University of Waterloo in 2014 under the supervision of Professor Andrew K. C. Wong and Daniel Stashuk from the Centre of Pattern Intelligence and Machine Intelligence. She has also been a visiting researcher at the Fields Institute (invited by Nancy Reid) and CUHK (invited by K. S. Leung and M. H. Wong) as well as a research scientist at VerticalScope and Stradigi AI.
Which talk track does this best fit into?
Workshop
Technical level of your talk?
(Technical Level: 5/7)
What you’ll learn:
This tutorial will give an overview of Pre-trained Sequence-to-Sequence Multilingual Models, tips, tricks and frameworks that can be used to adapt these models for NMT especially for low resource languages and the challenges faced while using these models and how to overcome them.
Abstract of Talk:
Neural Machine Translation (NMT) has seen a tremendous spurt of growth in less than ten years, and has already entered a mature phase. Pre-trained multilingual sequence-to-sequence (PMSS) models, such as mBART and mT5, are pre-trained on large general data, then fine-tuned to deliver impressive results for natural language inference, question answering, text simplification and neural machine translation. This tutorial presents 1) An Introduction to Sequence-to-Sequence Pre-trained Models, 2) How to adapt pre-trained models for NMT, 3) Tips and Tricks for NMT training and evaluation, 4) Challenges/Problems faced when using these models. This tutorial will be useful for those interested in NMT, from a research as well as industry point of view.
Presenters:
Ed Shee, Head of Developer Relations, Seldon & Ashley Scillitoe, Data Science Research Engineer, Seldon
About the Speakers:
With a background in cloud computing and a passion for machine learning, Ed has combined those skills and now works in the MLOps field where he heads up Developer Relations at Seldon. Organizer of Tech Ethics London and MLOps London, Ed is heavily involved in lots of developer communities and, thankfully, loves both beer and pizza.
Ashley is a data science research engineer at Seldon, where he works on developing production-ready tools for drift, adversarial and outlier detection. Prior to joining Seldon, he spent a number of years as a Research Fellow at The Alan Turing Institute. Here, he explored the use of machine learning for tackling aerospace engineering problems, with a focus on explainability and uncertainty quantification. Ashley also completed a PhD at the University of Cambridge, and is a keen proponent of open-source software.
Which talk track does this best fit into?
Workshop
What you’ll learn:
What drift detection is, why it’s important and how to get started.
Pre-requisite Knowledge:
No prior knowledge or understanding of drift detection is required (we’ll be covering that) but a basic knowledge of machine learning and some experience with Python will be helpful.
Abstract of Talk:
Although powerful, modern machine learning models can be sensitive. Seemingly subtle changes in a data distribution can destroy the performance of otherwise state-of-the art models, which can be especially problematic when ML models are deployed in production. In this workshop, we will give a hands-on overview to drift detection, the discipline focused on detecting such changes. We will start by building an understanding of the ways in which drift can occur, and why it pays to detect it. We’ll then explore the anatomy of a drift detector, and learn how they can be used to detect drift in a principled manner.
You will work through a real-world example using Alibi Detect, an open-source Python library offering powerful algorithms for adversarial, outlier and drift detection.You’ll learn how to set-up drift detectors, and deduce what type of drift is occurring. Since data can take many forms, such as image, text or tabular data, you’ll explore how to use existing ML models to pre-process your data into a form suitable for drift detectors. Then, to gain further insights into the causes of drift, you’ll employ state-of-the art detectors which are able to perform fine-grained attribution to instances and features. To assess whether model performance has been affected by drift, you’ll experiment with using model uncertainty based detectors. Finally, you’ll use a novel context-aware drift detector. This takes in context (or conditioning) variables, allowing you to test for drift conditional on context that is permitted to change. We’ll discuss how this functionality can be crucial in many real-world drift detection scenarios.
This hands-on workshop is targeted at a beginner-intermediate level. No prior knowledge or understanding of drift detection is required (we’ll be covering that) but a basic knowledge of machine learning and some experience with Python will be helpful.
The workshop will be hands-on based on Jupyter notebooks and cover sessions
– Why Graph and Graph Thinking
– Graph Algorithms
– Graph Embeddings
– Graph Neural Networks
Presenters:
Moderator (Roxana Sultan, Chief Data Officer and VP, Health, Vector Institute)
Dr. Benjamin Haibe-Kains, Senior Scientist, University Health Network
Team Fight Tumour (Jun Ma, Postdoctoral Fellow, Vector Institute / Ronald Xie, PhD Candidate, Vector Institute / Rex Ma, PhD Candidate, Vector Institute)
About the Speakers:
Roxana Sultan: Roxana Sultan is the Chief Data Officer and Vice President, Health at the Vector Institute. She leads Vector’s data strategy and its contributions to Ontario’s and Canada’s health sector. Along with our health team and partners, Roxana drives applications of AI to life sciences, fostering research, health sector and industrial sponsor projects, and initiatives to advance the health space, contributing to short-, medium-, and long-term impact achievements within the Ontario health ecosystem.
Roxana is the former Executive Director of the Provincial Council for Maternal and Child Health, where she led the implementation of evidence-based clinical quality improvement and access initiatives in obstetric, neonatal, and pediatric health services across Ontario. Her career includes leadership roles with The Hincks-Dellcrest Centre (now “SickKids Centre for Community Mental Health”), the Princess Margaret Cancer Centre in the University Health Network, the Canadian Institutes of Health Research (CIHR), Cancer Care Ontario, and the Hospital for Sick Children.
As an Adjunct Lecturer with the Institute of Health Policy, Management, and Evaluation (IHPME) at the University of Toronto (U of T), Roxana teaches a graduate-level course on intelligent medicine, machine learning, and knowledge representation. She also serves as the Vice Chair of the Board of the Canadian Cancer Society – Ontario Division.
Roxana completed her graduate education with the Department of Molecular and Medical Genetics at U of T, and holds a Masters of Health Science from IHPME.
Dr. Benjamin Haibe-Kains: Dr. Benjamin Haibe-Kains is a Senior Scientist at the Princess Margaret Cancer Centre: University Health Network, Associate Professor in the Medical Biophysics department of the University of Toronto, and Faculty Affiliate at the Vector Institute. Dr. Haibe-Kains earned his PhD in Bioinformatics at the Université Libre de Bruxelles (Belgium). Supported by a Fulbright Award, he did his postdoctoral fellowship at the Dana-Farber Cancer Institute and Harvard School of Public Health (USA). Dr. Haibe-Kains’ research focuses on the integration of high-throughput data from various sources to simultaneously analyze multiple facets of carcinogenesis. Dr. Haibe-Kains’ team is analyzing large-scale radiological and (pharmaco)genomic datasets to develop new prognostic and predictive models to improve cancer care.
Jun Ma is a Postdoctoral Fellow in the Department of Laboratory Medicine & Pathobiology at the University of Toronto. His research interests focus on the interdisciplinary areas of deep learning and medical image analysis, aiming to develop accurate, fast, and generalizable algorithms to improve healthcare. He has published seven first-author papers on top journals, such as TPAMI, TMI, and MedIA. He is the lead organizer of the MICCAI 2021-2022 FLARE Challenge.
Ronald Xie received his BSc in Microbiology and Immunology at the University of British Columbia in 2018. He then received his MPhil in Computational Biology at the Department of Applied Mathematics and Theoretical Physics at the University of Cambridge in 2019. Ronald is currently a PhD candidate in Computational Biology and Molecular Genetics (CBMG) at the Faculty of Medicine at University of Toronto. His research interests lie in deep learning applications in electron microscopy and single cell omics.
Rex Ma is currently a Computer Science Ph.D. student at the University of Toronto. He is interested in AI in healthcare and computation biology in general, with research focusing on multi-omics integration using machine learning.
Technical level of your talk?
(Technical Level: 5/7)
Pre-requisite Knowledge:
Basic ML
Abstract of Talk:
Radiation therapy planning for head and neck cancer is a time-consuming and complex task for radiologists. AI-based tools have tremendous potential for segmenting regions of interest and optimizing therapy planning. The Vector Institute and the Cancer Digital Intelligence Program (CDI) from the Princess Margaret Cancer Centre launched a Machine Learning Challenge in June 2022 focused on cancer image segmentation.
Building on foundational work from the lab of scientist, Dr. Benjamin Haibe-Kains, ten teams from Vector and UHN participated in the Challenge leveraging RADCURE, the largest head-and-neck cancer treatment dataset of its kind, containing the imaging, treatment, demographic and clinical data of 2745 head and neck cancer patients.
In this presentation, moderated by Vector’s Roxana Sultan, Dr Haibe Kains will provide an overview of his preliminary work and the winning Challenge team, Fight Tumour, will describe their winning submission.
Presenters:
Catalina Herrera, Principle Sales Engineer, Dataiku & Chris Helmus, Senior Sales Engineer, Dataiku
About the Speakers:
With a passion for data and analytics, Catalina Herrera has spent her entire career helping the industry push beyond digitalization to business transformation. She’s held both educational and technical positions, worked with state-of-the-art technology solutions across multiple industry verticals, and served as a data scientist and advanced analytics consultant. Today she works with Fortune 100 companies and global technology leaders on digital transformation initiatives.
Chris Helmus has spent his career helping people and organizations embrace self-service analytics and machine learning. His expertise spans from enabling business users to become data experts to MOps at scale with a focus on enabling collaboration. When he’s not working with data you can find Chris at music events in the Denver area.
Which talk track does this best fit into?
Workshop
Technical level of your talk?
(Technical level: 6/7)
What you’ll learn:
In this session, you’ll learn how Dataiku’s MLOps framework can help you to:
– Increase agility and solve difficulties in handoffs between business, data scientists, and IT
– Make your models trusted from the get go (and, therefore, reduce risk)
– Apply model control and approvals to enable, not disable, your AI projects
Pre-requisite Knowledge:
Understanding of MLOps
Abstract of Talk:
According to McKinsey, building ML into processes enables leading organizations to increase their process efficiency by 30% or more while also increasing revenues by up to 10%. However, it’s not that simple. Several blockers prevent organizations from overcoming the difficulties encountered when industrializing AI. As a result, it can take up to nine months for teams to go from the proof of concept stage to production. In this context, how do you remove friction from your MLOps process and make your model processes trusted, agile, and controlled, so that you can finally deliver more value from your analytics and model faster?
Presenter:
Milan Kordic, Senior Machine Learning Engineer, Tenstorrent
About the Speaker:
Milan is a Senior Machine Learning Engineer at Tenstorrent and a member of the Customer Success team. His role is to support Tenstorrent customers and the community of ML developers using Tenstorrent hardware to successfully build and deploy their AI solutions. With an educational background in Electrical and Computer Engineering and past work experiences as a Machine Learning Engineer, Data Scientist, and Analytics Engineer, Milan has strong knowledge of AI / ML systems and computer hardware architecture.
Which talk track does this best fit into?
Workshop
Technical level of your talk?
(Technical level: 4/7)
What you’ll learn:
– Overview of Tenstorrent’s hardware product line
– End-to-end overview of Tenstorrent’s software stack
– Stages of an AI model from pre-training, fine-tuning, evaluation and inference
– Hands-on demo of the Tenstorrent Model Zoo
Pre-requisite Knowledge:
– Knowledge of deep neural network models used for applications such as NLP and computer vision
– Knowledge of machine learning model training and inference
– Knowledge of computer hardware such as CPU, GPU, AI accelerators, etc. is helpful, but not required
Abstract of Talk:
Tenstorrent AI accelerator hardware is specially designed to accelerate artificial intelligence and machine learning applications, competing on a performance-per-dollar basis. For developers and engineers, having access to efficient AI computing power and an easy-to-use software API is critical for running large-scale and compute-intensive models such as BERT, GPT3, BART, ResNet50, and T5. In this workshop, we will introduce the Tenstorrent developer ecosystem including an overview of the hardware product line, the end-to-end software stack including BUDA, PyBUDA and Model Zoo, the stages of an AI model from pre-training, fine-tuning, evaluation and inference, and a hands-on demo of the Tenstorrent Model Zoo highlighting the key steps developers need to take to get their model running on Tenstorrent AI hardware.
Presenter:
Amber Roberts, Machine Learning Engineer, Arize AI
About the Speaker:
Amber Roberts is a community-oriented Machine Learning Engineer at Arize AI, an ML observability company. Amber’s role at Arize looks to help teams across all industries build ML Observability into their productionalized AI environments. Previously, Amber was a product manager of AI at Splunk and the Head of Artificial Intelligence at Insight Data Science. A Carnegie Fellow, Amber has an MS in Astrophysics from the Universidad de Chile.
Which talk track does this best fit into?
Workshop
Technical level of your talk?
(Technical level: 3/7)
What you’ll learn:
In this workshop, you’ll learn best practices for how to:
– Account for model, feature and actuals drift to ensure your models stay relevant
– Troubleshoot performance degradations across various cohorts
– Avoid common pitfalls from misleading evaluation metrics to imbalanced datasets
Abstract of Talk:
Taking a model from research to production is hard — and keeping it there is even harder! As more machine learning models are deployed into production, it is imperative to have tools to monitor, troubleshoot, and explain model decisions. In this workshop attendees will implement ML observability firsthand in the Arize platform to see if their fraud model is drifting, underperforming, and/or exhibiting bias. Participants will monitor, surface, resolve, and improve performance on ML models in production.
Presenter:
Tristan Zajonc, Co-Founder, Continual
About the Speaker:
Tristan is the cofounder of Continual, a startup focused on enabling pervasive operational AI within the enterprise. He was previously the CTO for Machine Learning at Cloudera.
Which talk track does this best fit into?
Workshop
Technical level of your talk?
(Technical level: 4/7)
What you’ll learn:
– An overview of the state-of-the-art of generative AI
– A hands on experience using generative AI to automate knowledge work.
Abstract of Talk:
With the emergence of large generative AI models such as GTP3, DallE2, and Stable Diffusion, generative AI is set to revolutionize knowledge work over the next few years. However applying these models to solve real world business problems remains a challenge due to the need to align models with human preferences, orchestrate models to address complex use cases, and augment models with human feedback and control. This workshop will provide an overview of the current state of generative AI and a hands on experience using generative AI to automate knowledge work.
Presenter:
Jim Olsen Chief Technology Officer, ModelOp
About the Speaker:
Jim Olsen serves as Chief Technology Officer at ModelOp where he leads the technical innovation and design of the ModelOp Center platform. Jim is also integral to advising ModelOp customer CIOs and CTOs on requirements to better support their IT operations as they execute on digital business strategies that often strain technology infrastructure.
Which talk track does this best fit into?
Workshop
Technical level of your talk?
(Technical level: 4/7)
What you’ll learn:
Basics of a model life cycle: What makes up a model life cycle and how do you design one
Governance: Developing an automated governance workflow
Monitoring: How to monitor models post-deployment in a flexible manner
Remediation: Creating remediation workflows that track and accelerate time to resolution
Pre-requisite Knowledge:
Will teach all skills, but some understanding of flow charts is helpful
Abstract of Talk:
In this session, ModelOp CTO Jim Olsen shows you how to design and build a model life cycle, including how to incorporate Industry best practices as well as provides considerations for creating the model life cycle, who should be involved, and the types of issues that must be considered.
Presenter:
Marcelo Litovsky Director of Sales Engineering, Aporia
About the Speaker:
Marcelo Litovsky is an experienced Information Technology professional with 30 years of diverse background in Enterprise Architecture, AI, Systems and Database Management, and Programming. He has worked in multiple industries: Financial Services, Entertainment, and Information Technology in his career. Today, he serves as Director of Sales Engineering at Aporia, bringing his expertise to help Data Scientists, Machine Learning Engineers, and Business Users work together to unlock and promote the business value of their machine learning models. You can find him at the gym, preparing healthy vegan meals when he is not talking to customers or writing Python code.
Which talk track does this best fit into?
Workshop
Technical level of your talk?
(Technical level: 4/7)
What you’ll learn:
This session will explore the steps you can take to prepare for ML observability. We will also discuss how observability helps data scientists and MLOps practitioners showcase the business value of the applications they deploy and get recognition for their hard work.
Abstract of Talk:
MLOps is bringing a lot of attention to the business impact of Machine Learning. It also introduces new challenges that cannot be efficiently addressed with DevOps. What are these challenges, and what makes MLOps so different from DevOps? They both deal with the life cycle of an application, so what is the difference? Most software applications have a pre-defined behavior. We know the data going in, and we know the data going out. Anything not matching a predefined format or schema is a problem. Machine Learning models follow the same pattern to operate, but their value diminishes as the content of the data changes. We are looking at the schema, format, and patterns describing a change in the data. This is the big difference between DevOps and MLOps, observing the data.
Most organizations have focused on the simplification, automation, and scalability of Machine Learning applications. Observability has taken a back seat. This session will explore the steps you can take to prepare for ML observability. We will also discuss how observability helps data scientists and MLOps practitioners showcase the business value of the applications they deploy and get recognition for their hard work.
Presenter:
James Cameron, Senior AI/ML Solutions Architect, NVIDIA
About the Speaker:
James is a Senior Solutions Architect from Nvidia where he works with companies to design, develop, and deploy their AI systems on the edge or in the data center. Previously he was a Team Lead at Patriot One Technologies where he designed and deployed many production AI/ML systems.
Which talk track does this best fit into?
Workshop
Technical level of your talk?
(Technical level: 5/7)
What you’ll learn:
Whether they are building their first MVP or scaling out to an entire data center, attendees will come away with a better understanding of all stages of bringing an AI system from the lab into the field. Code samples will be shared for performance tuning AI models, system monitoring, and inference serving at scale.
Abstract of Talk:
With more and more companies looking to improve their products and businesses with AI, machine learning engineering is becoming an important task in moving data science from the R&D lab to the field. This workshop will walk through the various stages of creating a production grade AI system, including creating an MVP, scaling/growing systems, and performance tuning. Real world lessons will be shared as tips and tricks around common pitfalls such as sizing hardware requirements, meeting latency targets, and developing MLOps procedures and systems.
Presenter:
Kallie Levy, Software Engineer, Superwise
About the Speaker:
Kallie Levy is an ML and data engineer. She started out working on a data-intensive, near-real-time system for the Israeli Defense Forces. Her greatest dev passions are around high-scale data ingestion and handling data lake and warehouse architecture. Currently, she works as a software engineer at Superwise, an end-to-end machine learning observability platform, and currently works on the development of the system’s entire data lake infrastructure.
In her free time, she likes to play sports, especially soccer!
Which talk track does this best fit into?
Workshop
Technical level of your talk?
(Technical level: 4/7)
What you’ll learn:
See an example of an ML pipeline implementation using Flyte
Deploy model to an endpoint
Define monitoring policies (include some best practices)
Trigger ML pipeline to create a new model based on fresh data
Abstract of Talk:
We’ll take a hands-on dive into implementing the 1st level of MLOps maturity and performing continuous training of the model by automating our ML pipeline. We’ll start with the ML pipeline and see how we can detect performance degradation and data drift in order to trigger the pipeline and create a new model based on fresh data.
Presenters:
Rafal Orlowski, Director, Data Science, Scotiabank & Fabio Dutra Sarti, Senior AI/ML Product Manager, Scotiabank
About the Speakers:
Rafal Orlowski is the Director of Data Science at Scotiabank on the Corporate Functions Analytics and AI/ML Solutions team supporting various AI and ML initiatives across the bank and it’s subsidiaries. He been a part Scotiabank for over 5 years and has worked on a variety of projects in digital, fraud, AML and mobile banking. He has nearly 10 years of hands on experience in Data Science and holds a Masters from University of Toronto in Economics.
Fabio Dutra Sarti is the Sr. AI/ML Product Manager at Scotiabank on the Corporate Functions Analytics and AI/ML Solutions team supporting the development of the Scotiabank chatbot for the past 1.5 years. Prior to that, he spent 2 years scaling Juliet, Westjet’s virtual assistant, helping hundreds of thousands of customers. Fabio’s experience also includes launching a crypto currency exchange in Brazil and a real estate start-up in Boston. He holds a Master of Advanced Management degree from Yale School of Management
Which talk track does this best fit into?
Workshop
Technical level of your talk?
(Technical level: 2/7)
What you’ll learn:
Examples of application of ML in healthcare; Impact of ML in healthcare technologies on patients; Potential biases in ML in Healthcare technologies
Abstract of Talk:
As organizations grow in size and complexity they are increasingly leveraging AI to resolve customer inquiries. The following talk outlines how Scotiabank built an in house chatbot solution from early strategic planning to launching to customers this year. First, the talk will highlight how a team of data scientists used data to prioritize intents and create a repository of training and testing utterances as a foundation for the NLU(Natural Language Understanding). Second, it will also show case the collaboration and engagement models between business, product, engineering, content, design, and accessibility to ensure that the chatbot delivers a dynamic conversational experience. Lastly, the talk will highlight how the data science team is leveraging NLP and ML to diagnose the health of the chatbot and identify new topics/data to train the chatbot on.
Presenter:
Alex Kim, Solutions Engineer, Iterative.ai
About the Speaker:
Alex Kim is a Solutions Engineer at Iterative. His background is in physics, software engineering, and machine learning. In the last couple of years, he became increasingly interested in the engineering side of ML projects: processes and tools needed to go from an idea to a production solution.
Which talk track does this best fit into?
Workshop
Technical level of your talk?
(Technical level: 4/7)
What you’ll learn:
Learn how to generate many reproducible ML experiments without leaving the context of their IDE
Abstract of Talk:
Learn how to manage and make your machine learning projects reproducible with an open-source tool DVC and its extension for VS Code.
We will see how to track datasets and models, run, compare, visualize, and track machine learning experiments right in VS Code IDE.
Presenters:
Rajiv Shah, Machine Learning Engineer, Hugging Face & Andrew Jardine, Enterprise Account Executive, Hugging Face
About the Speakers:
Rajiv Shah is a leading expert on practical AI. At Hugging Face, his primary focus is on enabling enterprises to succeed with AI. He previously led data science enablement efforts across hundreds of data scientists at DataRobot and has been part of data science teams at Snorkel AI, Caterpillar, and State Farm.
He is a widely recognized speaker on AI, has received many patents, and published research papers in several domains, including sports analytics, deep learning, and interpretability. He received a Ph.D. and a J.D. from the University of Illinois at Urbana Champaign.
Andrew is an Account Executive at Hugging Face where he helps enterprise customers understand how to leverage the 🤗 open-source resources to build state of the art ML. Outside of Hugging Face Andrew is the Toronto chapter lead for MLOps.Community and has a background in NLP, MLOps and engineering.
Which talk track does this best fit into?
Workshop
Technical level of your talk?
(Technical level: 5/7)
What you’ll learn:
It’s easy to get start building advanced AI applications.
Abstract of Talk:
Transformers have ushered in some of the most innovative and exciting AI technologies, like Dalle and Github’s Copilot. Rajiv shows you how to use open-source tools and models to solve use cases like auto-completion, semantic search, and document AI. He covers the power of embeddings, the emergence of Generative AI, and using transfer learning. He will end by touching on emerging trends around multimodal, multi-task, and large language models. The talk will also incorporate a notebook, code snippets, and paper references.
Presenter:
Sarah Sun, Director Data Science, Scotiabank
About the Speaker:
TBD
Which talk track does this best fit into?
Lightning Ignite Talk
Abstract of Talk:
Everything you ever wanted to know about model building at a bank, from conception to implementation, and then some!!
Presenter:
Mandy Gu, Engineering Manager, Wealthsimple
About the Speaker:
Mandy leads the Machine Learning Platform and the Data Platform teams at Wealthsimple. Prior to working on ML-ops and infrastructure, she was a NLP researcher in two separate conversational AI roles and a data scientist building models for the operations and client experience spaces.
Which talk track does this best fit into?
Lightning Ignite Talk
Abstract of Talk:
From powering money movement to fraud detection, machine learning models are critical to Wealthsimple’s core business process. This year, we built our next generation Machine Learning platform with a simple goal in mind: deploy new ML models within HOURS. This is how we scoped, designed and built our platform in just under 3 months.
Presenter:
Bhavani Rao, Technical Product Marketing Manager, Pachyderm
About the Speaker:
Bhavani Rao is a Technical Product Marketing Manager, responsible for product messaging and positioning at Pachyderm, a leader in data pipelining and MLOps. He has a diverse background, working with customers in Data Ops, DevOps, CI/CD, relational and NoSQL databases. A recent convert to the potential of AI/ML, Bhavani is passionate about technology and how it can be leveraged to solve customer problems. Throughout his career, Bhavani has promoted these learnings and best practices at numerous industry gatherings. He has a B.S. degree in Operations Research from Indiana University and an MBA from Columbia University.
Which talk track does this best fit into?
Lightning Ignite Talk
Abstract of Talk:
MLOps is not the same as DevOps. Iteration is a common theme to both methodologies but the requirements are different. Your pipelines need to version the code AND the data for easy reproducibility and rollback. Given the enormous size of datasets, data pipelines need to scale to petabytes, automatically trigger and process only the new data, rather than executing a complete run every time. Join us for this lightning talk as we discuss: what are data pipelines and how to leverage pipelines to quickly converge on a ML model.
Presenter:
Oren Razon, Co-Founder & CEO, Superwise
About the Speaker:
Oren is the co-founder and CEO of Superwise, the leading platform for model observability. With over 15 years of experience leading the development, deployment, and scaling of ML products, Oren is an expert ML practitioner specializing in MLOps tools and practices. Previously, Oren managed machine learning activities at Intel’s ML center and operated a machine learning boutique consulting agency helping leading tech companies such as Sisense, Gong, AT&T, and others, to build their machine learning-based products and infrastructure.
Which talk track does this best fit into?
Lightning Ignite Talk
Abstract of Talk:
It’s practically dogma today that a model’s best day in production will be its first day in production. Over time model performance degrades, and there are many variables that can cause decay, from real-world behavior changes to data drifts. When models misbehave, we often turn to retraining to fix the problem, but retraining is not always the best or only solution out there. In this session we’ll take a crash intro in alternative techniques.
Presenter:
Geoffrey Hunter, Lead Data Scientist, SpotHero
About the Speaker:
Geoffrey is passionate about forming end-to-end, product-focused Data Science teams that deliver high impact results. After his post doc, he was a Data Science consultant at different companies and then moved onto leading Data Science teams. He acts to contextualize Data Science opportunities for senior leadership and then mobilizes and mentors the data science teams to focus on understanding and solving problems.
Which talk track does this best fit into?
Lightning Ignite Talk
Abstract of Talk:
Geoffrey will share the 7 Questions I ask on a daily basis to rapidly qualify and frame new Data Science problems. This framework can be applied to understand new opportunities as well as to existing problems to help eliminate noise and focus one’s efforts.
Presenter:
Kai Luo Senior Applied Scientist, Loblaw Digital
About the Speaker:
Kai works as a Senior Applied Scientist at Loblaw Digital, leading development of the core recommender systems used in personalization use cases across all lines of business. Prior to that, he completed a master’s degree at the University of Toronto, with a thesis relating to conversational recommendation.
Which talk track does this best fit into?
Lightning Ignite Talk
Abstract of Talk:
Loblaw Companies Ltd is a retailing conglomerate that is the largest grocery retailer in Canada. Its subsidiaries include over 20 supermarket banners, such as Real Canadian Superstore, No Frills, and T&T. E-commerce, particularly for grocery, has become a significant part of the business, and personalization use cases play an important role in that domain. In this talk, Kai will discuss challenges relating to modeling customers’ behaviors and grocery products’ latent representation, and how we iterate our system to solve these challenges.
Presenter:
Rex Lam Director, Machine Learning Platform, Autodesk
About the Speaker:
Rex Lam leads the Machine Learning Platform team to build platform capabilities that enable full ML cycle development and operational tools at Autodesk that aim to enable ML solutions faster, trusted and scalable.
Which talk track does this best fit into?
Lightning Ignite Talk
Abstract of Talk:
AI/ML represents opportunities for Autodesk to drive insights & innovative solutions in architecture, design, and manufacturing tools.
Presenter:
Robinson Garcia, R&D Project Manager & Technology Specialist, Petrobras
About the Speaker:
Robinson Garcia graduated in Mechanical Engineering (2006) and did his MBA at the Rotman School of Management (2018). He currently works at the Petrobras Research Center (Cenpes), leading cooperation agreements with universities and startups to develop solutions for industrial asset management.
Which talk track does this best fit into?
Lightning Ignite Talk
Abstract of Talk:
Valued at 500 million CAD, Asset360 is a solution that improves efficiency and reduces the maintenance backlog of offshore installations at Petrobras. The project started back in 2018 after successful experiments with semantic segmentation, and after the signature of two cooperation terms with a partner university. We have built a Streetview-like platform and an information extraction solution over the past two years (+4000 registered users). Currently, we are experimenting with Human in the loop learning, recommendation system, and multi-objective optimization to increase value creation. Our moonshot is to create a two-sided platform that reduces the distance between specialized developer partners (research labs and startups) and internal consumers.
Presenter:
Amber Roberts, Machine Learning Engineer, Arize AI
About the Speaker:
Amber Roberts is a community-oriented Machine Learning Engineer at Arize AI, an ML observability company. Amber’s role at Arize looks to help teams across all industries build ML Observability into their productionalized AI environments. Previously, Amber was a product manager of AI at Splunk and the Head of Artificial Intelligence at Insight Data Science. A Carnegie Fellow, Amber has an MS in Astrophysics from the Universidad de Chile. When Amber isn’t expertly teaching ML observability best practices, you can find Amber playing with her two puppies, Rusty and Sully, on Florida’s warm beaches.
Which talk track does this best fit into?
Lightning Ignite Talk
Abstract of Talk:
From images and video to natural language and audio, unstructured data coupled with machine learning can unlock deeper AI potential and ROI for many organizations and use cases. Embeddings are the core of how deep learning models represent structures and are fundamental to how the next generation of ML models work.
Join this talk to:
– Troubleshoot a sentiment classification model in production
– Learn about emerging techniques like UMAP to transform unstructured data into embeddings that can be more efficiently processed by ML models
– Implement new technologies to monitor and improve models in production
Presenter:
Andrea Ruotolo, Global Head, Sustainability / ESG, Rockwell Automation
About the Speaker:
Andrea is the Global Head of Sustainability/ESG at Rockwell Automation, the world’s largest industrial automation company, with responsibility for advancing innovation in sustainability for Rockwell’s customers, which include Fortune 100 companies in energy and manufacturing, representing millions of employees and hundreds of billions of dollars in annual revenues. Andrea is a passionate evangelist for the role AI/ML can play in dramatically improving the sustainability of the industrial sector.
Across her nearly two decades of experience in leading technology innovation, in a career spanning Europe, Asia, and the Americas, Andrea has held multiple senior executive roles focused on applying advanced technologies to solve the sustainability challenge. She has served as co-founder and entrepreneur in smart grid consulting, global lead in the world’s largest engineering services firm in the energy sector, and senior director at a major utility.
As well as her Fulbright Doctorate in an ESG analysis of sustainable energy systems, Andrea holds a B.A. from the University of la Plata in Argentina, a M.Sci. in Aeronautical and Aerospace Engineering from Madrid Polytechnic, and certification in Digital Business Strategy and AI from MIT Sloan School of Management.
Which talk track does this best fit into?
Lightning Ignite Talk
Abstract of Talk:
The financial and business community have already caught on to the essential importance of sustainability. Investors now call for better practices and reporting on Environment, Social, and Governance, or ESG performance metrics. According to Bloomberg Intelligence, growth in ESG investing is fast becoming the new norm, with ESG investments projected to exceed USD 50 trillion by 2025 – more than 1/3 of all global assets under management. This movement of funds to ESG represents a massive, once-in-a-generation transition to an entirely new economy.
Sustainability is incredibly complex, involving billions of moving parts and decisions. It starts at the edge, where exabytes of data are flowing from real-time sensors and controls in factories and power plants, which aggregate up to the top-level decision makers in companies, which aggregate up to the massive funds that hold portfolios of those companies, and to government regulators and policymakers. AI is critical in analyzing those exabytes of data and enables closed-loop optimization to reduce energy, water, and waste.
In this session, we’ll explore the top 3 needs and opportunities for AI to catalyze change toward more sustainable companies, economies, and societies.
Business Leaders: C-Level Executives, Project Managers, and Product Owners will get to explore best practices, methodologies, principles, and practices for achieving ROI.
Engineers, Researchers, Data Practitioners: Will get a better understanding of the challenges, solutions, and ideas being offered via breakouts & workshops on Natural Language Processing, Neural Nets, Reinforcement Learning, Generative Adversarial Networks (GANs), Evolution Strategies, AutoML, and more.
Job Seekers: Will have the opportunity to network virtually and meet over 60 Top Al Start-ups and companies during the EXPO & Career Fair.
Ignite what is an Ignite Talk?
Ignite is an innovative and fast-paced style used to deliver a concise presentation.
During an Ignite Talk, presenters discuss their research using 20 image-centric slides which automatically advance every 15 seconds.
The result is a fun and engaging five-minute presentation.
You can see all our speakers and full agenda here