Word-Sequence Entropy: Towards uncertainty estimation in free-form medical question answering applications and beyond

Zhiyuan Wang, Jinhao Duan, Chenxi Yuan, Qingyu Chen, Tianlong Chen, Yue Zhang, Ren Wang, Xiaoshuang Shi, Kaidi Xu

Research output: Contribution to journalArticlepeer-review

Abstract

Uncertainty estimation is crucial for the reliability of safety-critical human and artificial intelligence (AI) interaction systems, particularly in the domain of healthcare engineering. However, a robust and general uncertainty measure for free-form answers has not been well-established in open-ended medical question-answering (QA) tasks, where generative inequality introduces a large number of irrelevant words and sequences within the generated set for uncertainty quantification (UQ), which can lead to biases. This paper proposes Word-Sequence Entropy (WSE), which calibrates uncertainty at both the word and sequence levels based on semantic relevance, highlighting keywords and enlarging the generative probability of trustworthy responses when performing UQ. We compare WSE with six baseline methods on five free-form medical QA datasets, utilizing seven popular large language models (LLMs), and demonstrate that WSE exhibits superior performance in accurate UQ under two standard criteria for correctness evaluation. Additionally, in terms of the potential for real-world medical QA applications, we achieve a significant enhancement (e.g., a 6.36% improvement in model accuracy on the COVID-QA dataset) in the performance of LLMs when employing responses with lower uncertainty that are identified by WSE as final answers, without requiring additional task-specific fine-tuning or architectural modifications.

Original languageEnglish (US)
Article number109553
JournalEngineering Applications of Artificial Intelligence
Volume139
DOIs
StatePublished - Jan 2025
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Control and Systems Engineering
  • Artificial Intelligence
  • Electrical and Electronic Engineering

Keywords

  • Generative inequality
  • Open-ended medical question-answering
  • Semantic relevance
  • Uncertainty quantification

Fingerprint

Dive into the research topics of 'Word-Sequence Entropy: Towards uncertainty estimation in free-form medical question answering applications and beyond'. Together they form a unique fingerprint.

Cite this