Hagay Lupesko
Senior Vice President of Engineering,
Cerebras Systems

ABOUT THE SPEAKER:

Hagay is Senior Vice President of AI Inference at Cerebras Systems, where he leads the development of the world’s fastest AI inference service powered by the Cerebras Wafer Scale Engine. He brings over 20 years of experience across software engineering and machine learning, with leadership roles spanning Meta AI, AWS ML, and Databricks Mosaic AI. His work focuses on large-scale AI infrastructure for training and serving state of the art AI models.

TALK TITLE:

Squeezing More Juice Out of Your LLM API: Performance Optimizations and How to Leverage Them

TRACK:

Technical / Engineering Talks

SUB TOPIC:

Inference Serving & Optimization

ABSTRACT:

Most developers use LLM APIs like a black box: pick a model, send requests, and live with whatever performance they get. This talk argues that this is the wrong way to think about performance. Modern inference stacks implement optimizations such as prompt caching, speculative decoding, disaggregated inference, and more. But for those optimization gains to be maximized, the application needs to use the API in the right way. I will explain the key optimizations that matter, how they work at a high level, and what API users can do to fully benefit from them in practice. The focus is not on theory or provider internals. It is on helping practitioners get better real-world performance from the same LLM APIs.

WHAT YOU’LL LEARN:

TBA

Who Attends

Attendees
0 +
Data Practitioners
0 %
Researchers/Academics
0 %
Business Leaders
0 %

2023 Event Demographics

Technical practitioners working directly with ML/AI systems
0 %
Currently Working in Industry*
0 %
Attendees Looking for Solutions
0 %
Currently Hiring
0 %
Attendees Actively Job-Searching
0 %

2023 Technical Background

Expert/Researcher
14%
Advanced
37%
Intermediate
28%
Beginner
7%

2023 Attendees & Thought Leadership

Attendees
0 +
Speakers
0 +
Company Sponsors
0 +

Business Leaders: C-Level Executives, Project Managers, and Product Owners will get to explore best practices, methodologies, principles, and practices for achieving ROI.

Engineers, Researchers, Data Practitioners: Will get a better understanding of the challenges, solutions, and ideas being offered via breakouts & workshops on Natural Language Processing, Neural Nets, Reinforcement Learning, Generative Adversarial Networks (GANs), Evolution Strategies, AutoML, and more.

Job Seekers: Will have the opportunity to network virtually and meet over 30+ Top Al Companies.

Ignite what is an Ignite Talk?

Ignite is an innovative and fast-paced style used to deliver a concise presentation.

During an Ignite Talk, presenters discuss their research using 20 image-centric slides which automatically advance every 15 seconds.

The result is a fun and engaging five-minute presentation.

You can see all our speakers and full agenda here

Get our official conference app
For Blackberry or Windows Phone, Click here
For feature details, visit Whova