Hagay Lupesko

Senior Vice President of Engineering,

Cerebras Systems

ABOUT THE SPEAKER:

Hagay is Senior Vice President of AI Inference at Cerebras Systems, where he leads the development of the world’s fastest AI inference service powered by the Cerebras Wafer Scale Engine. He brings over 20 years of experience across software engineering and machine learning, with leadership roles spanning Meta AI, AWS ML, and Databricks Mosaic AI. His work focuses on large-scale AI infrastructure for training and serving state of the art AI models.

TALK TITLE:

Squeezing More Juice Out of Your LLM API: Performance Optimizations and How to Leverage Them

TRACK:

Technical / Engineering Talks

SUB TOPIC:

Inference Serving & Optimization

ABSTRACT:

Most developers use LLM APIs like a black box: pick a model, send requests, and live with whatever performance they get. This talk argues that this is the wrong way to think about performance. Modern inference stacks implement optimizations such as prompt caching, speculative decoding, disaggregated inference, and more. But for those optimization gains to be maximized, the application needs to use the API in the right way. I will explain the key optimizations that matter, how they work at a high level, and what API users can do to fully benefit from them in practice. The focus is not on theory or provider internals. It is on helping practitioners get better real-world performance from the same LLM APIs.

WHAT YOU’LL LEARN:

TBA

Hagay Lupesko

Who Attends

2023 Event Demographics

2023 Technical Background

2023 Attendees & Thought Leadership