ABOUT THE SPEAKER:
Abhimanyu is a Senior Data Scientist at Elastic, where he works on the development and evaluation of enterprise-grade AI agents. He holds an M.Sc. in Big Data Analytics from Trent University, specializing in natural language processing.
Throughout his career, he has designed and deployed robust AI solutions across a range of industries, including social media, e-commerce, and metals and mining.
TALK TITLE:
TRACK:
SUB TOPIC:
ABSTRACT:
Your agent eval says accuracy improved. But did latency spike? Does your LLM-based metric even agree with human judgment? And is that 5% gain real or noise? Do we ship it or not?
If you’re evaluating AI agents, you’ve likely encountered hidden failures such as:
In this session, I’ll walk through how we addressed these at Elastic. Using a real experiment as an example, I’ll cover the evaluation setup we built to catch these failures. This includes multi-metric evaluation to expose tradeoffs (accuracy, tool usage, and latency) and a claim-level correctness evaluator (we developed in house) validated against human judgment to ensure LLM-based scores are meaningful. I’ll also discuss key significance testing principles we used to filter out noise and verify real gains.
Along the way, I’ll show the prompt structure behind our evaluator and examples of practical results.
WHAT YOU’LL LEARN:
Business Leaders: C-Level Executives, Project Managers, and Product Owners will get to explore best practices, methodologies, principles, and practices for achieving ROI.
Engineers, Researchers, Data Practitioners: Will get a better understanding of the challenges, solutions, and ideas being offered via breakouts & workshops on Natural Language Processing, Neural Nets, Reinforcement Learning, Generative Adversarial Networks (GANs), Evolution Strategies, AutoML, and more.
Job Seekers: Will have the opportunity to network virtually and meet over 30+ Top Al Companies.
Ignite what is an Ignite Talk?
Ignite is an innovative and fast-paced style used to deliver a concise presentation.
During an Ignite Talk, presenters discuss their research using 20 image-centric slides which automatically advance every 15 seconds.
The result is a fun and engaging five-minute presentation.
You can see all our speakers and full agenda here