MultiMed: Multilingual Medical Speech Recognition via Attention Encoder Decoder

  • Khai Le-Duc
  • , Phuc Phan
  • , Tan Hanh Pham
  • , Bach Phan Tat
  • , Minh Huong Ngo
  • , Chris Ngo
  • , Thanh Nguyen-Tang
  • , Truong Son Hy

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Multilingual automatic speech recognition (ASR) in the medical domain serves as a foundational task for various downstream applications such as speech translation, spoken language understanding, and voice-activated assistants. This technology improves patient care by enabling efficient communication across language barriers, alleviating specialized workforce shortages, and facilitating improved diagnosis and treatment, particularly during pandemics. In this work, we introduce MultiMed, the first multilingual medical ASR dataset, along with the first collection of small-to-large end-to-end medical ASR models, spanning five languages: Vietnamese, English, German, French, and Mandarin Chinese. To our best knowledge, MultiMed stands as the world’s largest medical ASR dataset across all major benchmarks: total duration, number of recording conditions, number of accents, and number of speaking roles. Furthermore, we present the first multilinguality study for medical ASR, which includes reproducible empirical baselines, a monolinguality-multilinguality analysis, Attention Encoder Decoder (AED) vs Hybrid comparative study and a linguistic analysis. We present practical ASR end-to-end training schemes optimized for a fixed number of trainable parameters that are common in industry settings. All code, data, and models are available online.

Original languageEnglish (US)
Title of host publicationIndustry Track
EditorsGeorg Rehm, Yunyao Li
PublisherAssociation for Computational Linguistics (ACL)
Pages1113-1150
Number of pages38
ISBN (Electronic)9798891762886
DOIs
StatePublished - 2025
Externally publishedYes
Event63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025 - Vienna, Austria
Duration: Jul 27 2025Aug 1 2025

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
Volume6
ISSN (Print)0736-587X

Conference

Conference63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025
Country/TerritoryAustria
CityVienna
Period7/27/258/1/25

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Linguistics and Language
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'MultiMed: Multilingual Medical Speech Recognition via Attention Encoder Decoder'. Together they form a unique fingerprint.

Cite this