Extracting patient demographics and personal medical information from online health forums

Yang Liu, Songhua Xu, Hong Jun Yoon, Georgia Tourassi

Research output: Contribution to journalArticlepeer-review

10 Scopus citations


Natural language processing has been successfully leveraged to extract patient information from unstructured clinical text. However the majority of the existing work targets at obtaining a specific category of clinical information through individual efforts. In the midst of the Health 2.0 wave, online health forums increasingly host abundant and diverse health-related information regarding the demographics and medical information of patients who are either actively participating in or passively reported at these forums. The potential categories of such information span a wide spectrum, whose extraction requires a systematic and comprehensive approach beyond the traditional isolated efforts that specialize in harvesting information of single categories. In this paper, we develop a new integrated biomedical NLP pipeline that automatically extracts a comprehensive set of patient demographics and medical information from online health forums. The pipeline can be adopted to construct structured personal health profiles from unstructured user-contributed content on eHealth social media sites. This paper describes key aspects of the pipeline as well as reports experimental results that show the system's satisfactory performance in accomplishing a series of NLP tasks of extracting patient information from online health forums.

Original languageEnglish (US)
Pages (from-to)1825-1834
Number of pages10
JournalAMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium
StatePublished - 2014

All Science Journal Classification (ASJC) codes

  • General Medicine


Dive into the research topics of 'Extracting patient demographics and personal medical information from online health forums'. Together they form a unique fingerprint.

Cite this