Skip to main navigation Skip to search Skip to main content

Exploring the Trade-Offs: Unified Large Language Models vs Local Fine-Tuned Models for Highly-Specific Radiology NLI Task

  • Zihao Wu
  • , Lu Zhang
  • , Chao Cao
  • , Xiaowei Yu
  • , Zhengliang Liu
  • , Lin Zhao
  • , Yiwei Li
  • , Haixing Dai
  • , Chong Ma
  • , Gang Li
  • , Wei Liu
  • , Quanzheng Li
  • , Dinggang Shen
  • , Xiang Li
  • , Dajiang Zhu
  • , Tianming Liu

Research output: Contribution to journalArticlepeer-review

Abstract

Recently, ChatGPT and GPT-4 have emerged and gained immense global attention due to their unparalleled performance in language processing. Despite demonstrating impressive capability in various open-domain tasks, their adequacy in highly specific fields like radiology remains untested. Radiology presents unique linguistic phenomena distinct from open-domain data due to its specificity and complexity. Assessing the performance of large language models (LLMs) in such specific domains is crucial not only for a thorough evaluation of their overall performance but also for providing valuable insights into future model design directions: whether model design should be generic or domain-specific. To this end, in this study, we evaluate the performance of ChatGPT/GPT-4 on a radiology natural language inference (NLI) task and compare it to other models fine-tuned specifically on task-related data samples. We also conduct a comprehensive investigation on ChatGPT/GPT-4’s reasoning ability by introducing varying levels of inference difficulty. Our results show that 1) ChatGPT and GPT-4 outperform other LLMs in the radiology NLI task and 2) other specifically fine-tuned Bert-based models require significant amounts of data samples to achieve comparable performance to ChatGPT/GPT-4. These findings not only demonstrate the feasibility and promise of constructing a generic model capable of addressing various tasks across different domains, but also highlight several key factors crucial for developing a unified model, particularly in a medical context, paving the way for future artificial general intelligence (AGI) systems. We release our code and data to the research community.

Original languageEnglish (US)
Pages (from-to)1027-1041
Number of pages15
JournalIEEE Transactions on Big Data
Volume11
Issue number3
DOIs
StatePublished - 2025
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Information Systems and Management

Keywords

  • Large language models
  • natural language inference
  • natural language processing
  • radiology report

Fingerprint

Dive into the research topics of 'Exploring the Trade-Offs: Unified Large Language Models vs Local Fine-Tuned Models for Highly-Specific Radiology NLI Task'. Together they form a unique fingerprint.

Cite this