ABOUT THE SPEAKER:
TALK TITLE:
TRACK:
SUB TOPIC:
ABSTRACT:
Modern machine learning and model training pipelines depend on petabytes of multimodal data — images, videos, point clouds, text and more — yet data I/O and storage remain critical bottlenecks when experimenting and doing research. This workshop session addresses that gap by introducing Lance, an open-source columnar format designed for ML workloads, and LanceDB, the multimodal retrieval library built on to of Lance. We begin with Lance’s architecture and what makes it uniquely suited to multimodal training — fast random access, native blob storage, and built-in versioning, and then move into live integration examples with PyTorch and Hugging Face Datasets. From there, we work through a 3D world-model dataset case study, discuss benchmarks on I/O performance during data loading, and show how to add derived features like embeddings and annotations without rewriting existing data, and scale data loading for distributed training. Attendees will leave with working-level knowledge of how to use a modern, purpose-built open source format and a practical understanding of how to replace fragmented storage stacks with a single, scalable data layer that keeps their GPUs fed during model training.
WHAT YOU’LL LEARN:
Business Leaders: C-Level Executives, Project Managers, and Product Owners will get to explore best practices, methodologies, principles, and practices for achieving ROI.
Engineers, Researchers, Data Practitioners: Will get a better understanding of the challenges, solutions, and ideas being offered via breakouts & workshops on Natural Language Processing, Neural Nets, Reinforcement Learning, Generative Adversarial Networks (GANs), Evolution Strategies, AutoML, and more.
Job Seekers: Will have the opportunity to network virtually and meet over 30+ Top Al Companies.
Ignite what is an Ignite Talk?
Ignite is an innovative and fast-paced style used to deliver a concise presentation.
During an Ignite Talk, presenters discuss their research using 20 image-centric slides which automatically advance every 15 seconds.
The result is a fun and engaging five-minute presentation.
You can see all our speakers and full agenda here