Lin Liu
Director, Data Science,
Wealthsimple

ABOUT THE SPEAKER:

As Director of Data Science at Wealthsimple, Lin Liu architects AI/ML solutions that power the future of finance. His experience includes leading AI/ML consulting engagements for AWS clients at Amazon and creating flagship fraud and credit models for Capital One Canada. A patented inventor in credit scoring, Lin specializes in building scalable AI/ML solutions that bridge the gap between data science and tangible business value.

TALK TITLE:

Beyond NLP: Technical Challenges in Building a Foundation Model for Sequential Event Data

TRACK:

Technical / Engineering Talks

SUB TOPIC:

Fine-Tuning & Training – Safety / Governance / Auditability

ABSTRACT:

Foundation models have achieved remarkable success in natural language and vision, but applying the same paradigm to structured, sequential event data — transactions, interactions, behavioural signals — introduces a distinct set of technical challenges that existing literature largely overlooks.

In this talk, we share what we learned building and productionizing a domain-specific foundation model trained on millions of heterogeneous event sequences. We dig into:

  • Tokenization for non-language sequences: Event data mixes categorical fields, continuous values, and irregular timestamps. We explore representations from naive text serialization to structured entity encodings, and the surprising impact tokenization strategy has on downstream performance
  • Architecture trade-offs: Head-to-head comparisons across three approaches — off-the-shelf LLM embeddings, a custom set-aware transformer, and a hybrid fine-tuned LLM — and when each breaks down
  • Multi-objective training: Combining next-event prediction, next-token prediction, and contrastive learning with Matryoshka representation learning for flexible embedding dimensionality without retraining
  • Temporal encoding for irregular time series: Encoding events spaced across seconds to months, unlike NLP’s uniform token positions
  • Inference at scale: From a 10-day batch pipeline to sub-300ms real-time inference for millions of sequences via model distillation, context window management, and SageMaker serving
  • Evaluation beyond perplexity: Why standard LM metrics fail for event prediction, and the framework we built to measure both predictive accuracy and embedding quality

WHAT YOU’LL LEARN:

TBA

Who Attends

Attendees
0 +
Data Practitioners
0 %
Researchers/Academics
0 %
Business Leaders
0 %

2023 Event Demographics

Technical practitioners working directly with ML/AI systems
0 %
Currently Working in Industry*
0 %
Attendees Looking for Solutions
0 %
Currently Hiring
0 %
Attendees Actively Job-Searching
0 %

2023 Technical Background

Expert/Researcher
14%
Advanced
37%
Intermediate
28%
Beginner
7%

2023 Attendees & Thought Leadership

Attendees
0 +
Speakers
0 +
Company Sponsors
0 +

Business Leaders: C-Level Executives, Project Managers, and Product Owners will get to explore best practices, methodologies, principles, and practices for achieving ROI.

Engineers, Researchers, Data Practitioners: Will get a better understanding of the challenges, solutions, and ideas being offered via breakouts & workshops on Natural Language Processing, Neural Nets, Reinforcement Learning, Generative Adversarial Networks (GANs), Evolution Strategies, AutoML, and more.

Job Seekers: Will have the opportunity to network virtually and meet over 30+ Top Al Companies.

Ignite what is an Ignite Talk?

Ignite is an innovative and fast-paced style used to deliver a concise presentation.

During an Ignite Talk, presenters discuss their research using 20 image-centric slides which automatically advance every 15 seconds.

The result is a fun and engaging five-minute presentation.

You can see all our speakers and full agenda here

Get our official conference app
For Blackberry or Windows Phone, Click here
For feature details, visit Whova