Mehdi Rezagholizadeh
Principal Research Scientist,
AMD

ABOUT THE SPEAKER:

Mehdi Rezagholizadeh is a Principal Member of the Technical Committee at AMD. Before joining AMD, he was a Principal Research Scientist at Huawei Noah’s Ark Lab Canada, where he worked since 2017 and served as the leader of the Canada NLP team for over six years. His research and projects focused on deep learning and its applications in NLP, computer vision (CV), and speech processing. He has contributed to advancements in generative adversarial networks, computational NLP, and efficient solutions for training, model architecture, and inference of pre-trained models.

Mehdi holds more than 15 patents and has authored over 50 publications in leading conferences and journals, including TACL, NeurIPS, AAAI, ACL, NAACL, EMNLP, EACL, Interspeech, and ICASSP. Additionally, he has actively contributed to the academic and industrial communities by organizing prominent workshops, such as the NeurIPS Efficient Natural Language and Speech Processing (ENLSP) workshops (2021–2024), and by serving on technical committees for ACL, EMNLP, NAACL, and EACL, including as Area Chair and Senior Area Chair for NAACL 2024. Over his career, he has successfully supervised more than 20 M.Sc. and Ph.D. interns in both industrial and academic settings.

He earned his B.Sc. in 2009 and M.Sc. in 2011 from the University of Tehran and completed his Ph.D. in 2016 at McGill University in Electrical and Computer Engineering (Centre for Intelligent Machines).

TALK TITLE:

Long Context Training and Inference on AMD GPUs

TRACK:

Technical / Engineering Talks

SUB TOPIC:

Fine-Tuning & Training – Safety / Governance / Auditability

ABSTRACT:

Long-context capabilities are becoming essential for modern LLM applications, from document understanding and agent workflows to code, RAG, and multi-step reasoning. But supporting long sequences efficiently is still one of the hardest practical challenges in large-scale model training and inference, especially when memory, bandwidth, and latency become the real bottlenecks rather than raw compute alone. In this talk, I will discuss the key systems and modeling considerations for long-context training and inference on AMD GPUs. I will cover the main sources of cost in long-sequence workloads, including KV-cache growth, attention complexity, memory movement, parallelism strategies, and kernel efficiency. I will also discuss practical approaches to make long-context workloads feasible in production and research settings, including efficient attention variants, context extension strategies, distributed training design, cache optimization, precision choices, and inference-time serving trade-offs. The talk is grounded in a practitioner-focused perspective: what actually matters when moving from promising ideas to stable, scalable implementations on AMD hardware. The goal is to provide a clear view of the design space and a practical roadmap for building efficient long-context systems on modern AMD GPU platforms.

WHAT YOU’LL LEARN:

TBA

Who Attends

Attendees
0 +
Data Practitioners
0 %
Researchers/Academics
0 %
Business Leaders
0 %

2023 Event Demographics

Technical practitioners working directly with ML/AI systems
0 %
Currently Working in Industry*
0 %
Attendees Looking for Solutions
0 %
Currently Hiring
0 %
Attendees Actively Job-Searching
0 %

2023 Technical Background

Expert/Researcher
14%
Advanced
37%
Intermediate
28%
Beginner
7%

2023 Attendees & Thought Leadership

Attendees
0 +
Speakers
0 +
Company Sponsors
0 +

Business Leaders: C-Level Executives, Project Managers, and Product Owners will get to explore best practices, methodologies, principles, and practices for achieving ROI.

Engineers, Researchers, Data Practitioners: Will get a better understanding of the challenges, solutions, and ideas being offered via breakouts & workshops on Natural Language Processing, Neural Nets, Reinforcement Learning, Generative Adversarial Networks (GANs), Evolution Strategies, AutoML, and more.

Job Seekers: Will have the opportunity to network virtually and meet over 30+ Top Al Companies.

Ignite what is an Ignite Talk?

Ignite is an innovative and fast-paced style used to deliver a concise presentation.

During an Ignite Talk, presenters discuss their research using 20 image-centric slides which automatically advance every 15 seconds.

The result is a fun and engaging five-minute presentation.

You can see all our speakers and full agenda here

Get our official conference app
For Blackberry or Windows Phone, Click here
For feature details, visit Whova