Lin Liu

Director, Data Science,

Wealthsimple

ABOUT THE SPEAKER:

As Director of Data Science at Wealthsimple, Lin Liu architects AI/ML solutions that power the future of finance. His experience includes leading AI/ML consulting engagements for AWS clients at Amazon and creating flagship fraud and credit models for Capital One Canada. A patented inventor in credit scoring, Lin specializes in building scalable AI/ML solutions that bridge the gap between data science and tangible business value.

TALK TITLE:

Beyond NLP: Technical Challenges in Building a Foundation Model for Sequential Event Data

TRACK:

Technical / Engineering Talks

SUB TOPIC:

Fine-Tuning & Training – Safety / Governance / Auditability

ABSTRACT:

Foundation models have achieved remarkable success in natural language and vision, but applying the same paradigm to structured, sequential event data — transactions, interactions, behavioural signals — introduces a distinct set of technical challenges that existing literature largely overlooks.

In this talk, we share what we learned building and productionizing a domain-specific foundation model trained on millions of heterogeneous event sequences. We dig into:

Tokenization for non-language sequences: Event data mixes categorical fields, continuous values, and irregular timestamps. We explore representations from naive text serialization to structured entity encodings, and the surprising impact tokenization strategy has on downstream performance
Architecture trade-offs: Head-to-head comparisons across three approaches — off-the-shelf LLM embeddings, a custom set-aware transformer, and a hybrid fine-tuned LLM — and when each breaks down
Multi-objective training: Combining next-event prediction, next-token prediction, and contrastive learning with Matryoshka representation learning for flexible embedding dimensionality without retraining
Temporal encoding for irregular time series: Encoding events spaced across seconds to months, unlike NLP’s uniform token positions
Inference at scale: From a 10-day batch pipeline to sub-300ms real-time inference for millions of sequences via model distillation, context window management, and SageMaker serving
Evaluation beyond perplexity: Why standard LM metrics fail for event prediction, and the framework we built to measure both predictive accuracy and embedding quality

WHAT YOU’LL LEARN:

TBA

Lin Liu

Who Attends

2023 Event Demographics

2023 Technical Background

2023 Attendees & Thought Leadership