Ahmed Radwan

Machine Learning Specialist,

Vector Institute

ABOUT THE SPEAKER:

Ahmed Radwan is a Machine Learning Specialist at the Vector Institute, where his research sits at the intersection of multimodal AI, large language models, and responsible AI. He is the lead developer of SONIC-O1, the first open-source omnimodal benchmark for evaluating multimodal LLMs on real-world audio-video understanding, and the creator of UnBias+, a production-grade open-source toolkit for automated bias detection and debiasing in text. His broader work spans agentic system design, LLM hallucination reduction, and fairness evaluation, with research published in IEEE journals and international AI conferences. He has conducted research across the Vector Institute, York University, and KAUST.

TALK TITLE:

SONIC-O1: A Real-World Benchmark for Evaluating Multimodal LLMs on Audio-Video Understanding

TRACK:

Technical / Engineering Talks

SUB TOPIC:

Evaluation Methods & Capability Benchmarking

ABSTRACT:

Multimodal Large Language Models (MLLMs) are a major focus of recent AI research. However, most prior work focuses on static image understanding, while their ability to process sequential audio-video data remains underexplored. This gap highlights the need for a high-quality benchmark to systematically evaluate MLLM performance in a real-world setting. We introduce SONIC-O1, a comprehensive, fully human-verified benchmark spanning 13 real-world conversational domains with 4,958 annotations and demographic metadata. SONIC-O1 evaluates MLLMs on key tasks, including open-ended summarization, multiple-choice question (MCQ) answering, and temporal localization with supporting rationales (reasoning). Experiments on closed- and open-source models reveal limitations. While the performance gap in MCQ accuracy between two model families is relatively small, we observe a substantial 22.6% performance difference in temporal localization between the best performing closed-source and open-source models. Performance further degrades across demographic groups, indicating persistent disparities in model behavior. Overall, SONIC-O1 provides an open evaluation suite for temporally grounded and socially robust multimodal understanding.

WHAT YOU’LL LEARN:

TBA

Ahmed Radwan

Who Attends

2023 Event Demographics

2023 Technical Background

2023 Attendees & Thought Leadership