Sarwar Bhuiyan
Solution Engineer,
LanceDB

ABOUT THE SPEAKER:

TBA

TALK TITLE:

Enhancing Training Data Pipelines with Lance and the Multimodal Lakehouse

TRACK:

Technical / Engineering Talks

SUB TOPIC:

Fine-Tuning & Training – Safety / Governance / Auditability

ABSTRACT:

Modern machine learning and model training pipelines depend on petabytes of multimodal data — images, videos, point clouds, text and more — yet data I/O and storage remain critical bottlenecks when experimenting and doing research. This workshop session addresses that gap by introducing Lance, an open-source columnar format designed for ML workloads, and LanceDB, the multimodal retrieval library built on to of Lance. We begin with Lance’s architecture and what makes it uniquely suited to multimodal training — fast random access, native blob storage, and built-in versioning, and then move into live integration examples with PyTorch and Hugging Face Datasets. From there, we work through a 3D world-model dataset case study, discuss benchmarks on I/O performance during data loading, and show how to add derived features like embeddings and annotations without rewriting existing data, and scale data loading for distributed training. Attendees will leave with working-level knowledge of how to use a modern, purpose-built open source format and a practical understanding of how to replace fragmented storage stacks with a single, scalable data layer that keeps their GPUs fed during model training.

WHAT YOU’LL LEARN:

TBA

Who Attends

Attendees
0 +
Data Practitioners
0 %
Researchers/Academics
0 %
Business Leaders
0 %

2023 Event Demographics

Technical practitioners working directly with ML/AI systems
0 %
Currently Working in Industry*
0 %
Attendees Looking for Solutions
0 %
Currently Hiring
0 %
Attendees Actively Job-Searching
0 %

2023 Technical Background

Expert/Researcher
14%
Advanced
37%
Intermediate
28%
Beginner
7%

2023 Attendees & Thought Leadership

Attendees
0 +
Speakers
0 +
Company Sponsors
0 +

Business Leaders: C-Level Executives, Project Managers, and Product Owners will get to explore best practices, methodologies, principles, and practices for achieving ROI.

Engineers, Researchers, Data Practitioners: Will get a better understanding of the challenges, solutions, and ideas being offered via breakouts & workshops on Natural Language Processing, Neural Nets, Reinforcement Learning, Generative Adversarial Networks (GANs), Evolution Strategies, AutoML, and more.

Job Seekers: Will have the opportunity to network virtually and meet over 30+ Top Al Companies.

Ignite what is an Ignite Talk?

Ignite is an innovative and fast-paced style used to deliver a concise presentation.

During an Ignite Talk, presenters discuss their research using 20 image-centric slides which automatically advance every 15 seconds.

The result is a fun and engaging five-minute presentation.

You can see all our speakers and full agenda here

Get our official conference app
For Blackberry or Windows Phone, Click here
For feature details, visit Whova