DI++: A deep learning system for patient condition identification in clinical notes

Jinhe Shi, Xiangyu Gao, William C. Kinsman, Chenyu Ha, Guodong Gordon Gao, Yi Chen

Research output: Contribution to journalArticlepeer-review

1 Scopus citations


Accurately recording a patient's medical conditions in an EHR system is the basis of effectively documenting patient health status, coding for billing, and supporting data-driven clinical decision making. However, patient conditions are often not fully captured in structured EHR systems, but may be documented in unstructured clinical notes. The challenge is that not all disease mentions in clinical notes actually refer to a patient's conditions. We developed a two-step workflow for identifying patient's conditions from clinical notes: disease mention extraction and disease mention classification. We implemented this workflow in a prototype system, DI++, for Disease Identification. An advanced deep learning model, CLSTM-Attention model, is developed for disease mention classification in DI++. Extensive empirical evaluation on about one million pages of de-identified clinical notes demonstrates that DI++ has significant performance advantage over existing systems on F1 Score, Area Under the Curve metrics, and efficiency. The proposed CLSTM-Attention model outperforms the existing deep learning models for disease mention classification.

Original languageEnglish (US)
Article number102224
JournalArtificial Intelligence in Medicine
StatePublished - Jan 2022

All Science Journal Classification (ASJC) codes

  • Medicine (miscellaneous)
  • Artificial Intelligence


  • Clinical notes
  • Concept extraction
  • Deep learning
  • Deep neural network
  • Disease mention extraction
  • Natural language processing (NLP)
  • Patient condition classification


Dive into the research topics of 'DI++: A deep learning system for patient condition identification in clinical notes'. Together they form a unique fingerprint.

Cite this