Skip to main navigation Skip to search Skip to main content

Optimizing Manual Review Using Machine Learning in Interface Terminology Curation for Automatic EHR Highlighting

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Discharge notes are dense, information-rich documents that contain patient histories, diagnoses, treatments, and clinical observations, as well as post-discharge care instructions. While they provide essential data for clinical decision-making and research, they are often written using abbreviations and complex medical jargon, making them difficult for patients to interpret. Automatic highlighting of discharge notes enhances information accessibility, supports summarization and simplification, and improves clinical interoperability of the notes. Achieving accurate highlighting requires terminologies that include fine-granularity phrases, which existing reference terminologies such as SNOMED CT lack. To address this limitation, in our previous work, we proposed the Cardiology Interface Terminology (CIT), tailored for accurate highlighting of discharge notes of cardiology patients. Candidate concepts to be added to CIT were extracted from notes through concatenation and anchoring operations, with each phrase undergoing automatic and manual review before inclusion in CIT. Manual review of these phrases is highly time-consuming and costly process. In this study, we propose a Machine Learning (ML)-assisted approach to reduce the manual review efforts involved in terminology curation. We trained a Neural Network (NN) model on varying subsets of phrases generated through concatenation and anchoring, to determine the minimum number of phrases that must be manually reviewed to effectively train the ML model to label the remaining phrases automatically. The optimal batch sizes were identified as 6,000 (out of 28,617) for concatenation and 3,000 (out of 9,845) for anchoring. The resulting terminology (CITML2+) achieved a coverage of 68.74% and breadth of 1.6 on the test dataset, closely matching the fully manually curated CIT+ (coverage 70.21%, breadth 1.6), with comparable completeness (97.4% vs. 98.6%) and conciseness (84.1% vs. 83.6%). These findings demonstrate that substantial reductions in manual review can be achieved without compromising highlighting quality, providing a scalable and efficient framework for curating interface terminologies across diverse medical domains.

Original languageEnglish (US)
Title of host publicationProceedings - 2025 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2025
EditorsJuan Liu, Jingshan Huang, Xiaowo Wang, Fa Zhang, Xiufen Zou, Tian Tian, Xiaohua Hu, Bin Hu, Yi Xiong
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages6974-6980
Number of pages7
ISBN (Electronic)9798331515577
DOIs
StatePublished - 2025
Event2025 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2025 - Wuhan, China
Duration: Dec 15 2025Dec 18 2025

Publication series

NameProceedings - 2025 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2025

Conference

Conference2025 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2025
Country/TerritoryChina
CityWuhan
Period12/15/2512/18/25

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computer Vision and Pattern Recognition
  • Biomedical Engineering
  • Modeling and Simulation
  • Medicine (miscellaneous)
  • Health Informatics

Keywords

  • Discharge notes
  • EHRs
  • Highlighting
  • Interface Terminology
  • Machine Learning

Fingerprint

Dive into the research topics of 'Optimizing Manual Review Using Machine Learning in Interface Terminology Curation for Automatic EHR Highlighting'. Together they form a unique fingerprint.

Cite this