Boosting text extraction from biomedical images using text region detection

Songhua Xu, Michael Krauthammer

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Scopus citations

Abstract

In this paper, we show that domain-optimized text detection in biomedical images is important for boosting text extraction recall via off-the-shelf OCR engines. Methodologically, we contrast OCR performance when processing raw biomedical images, compared to preprocessing those images, and performing OCR on detected image text regions only. To quantify OCR extraction results, we rely on a gold standard image text corpus with manually identified image text strings. To demonstrate the positive effect on biomedical image retrieval, we apply image text detection and extraction to a large corpus of biomedical images in the Yale Image Finder system. We show that improved text extraction results in the retrieval of a larger number of relevant images for a set of domain-relevant keyword searches.

Original languageEnglish (US)
Title of host publicationProceedings of the 2011 Biomedical Sciences and Engineering Conference
Subtitle of host publicationImage Informatics and Analytics in Biomedicine, BSEC 2011
DOIs
StatePublished - 2011
Externally publishedYes
Event2011 Biomedical Sciences and Engineering Conference: Image Informatics and Analytics in Biomedicine, BSEC 2011 - Knoxville, TN, United States
Duration: Mar 15 2011Mar 17 2011

Publication series

NameProceedings of the 2011 Biomedical Sciences and Engineering Conference: Image Informatics and Analytics in Biomedicine, BSEC 2011

Other

Other2011 Biomedical Sciences and Engineering Conference: Image Informatics and Analytics in Biomedicine, BSEC 2011
Country/TerritoryUnited States
CityKnoxville, TN
Period3/15/113/17/11

All Science Journal Classification (ASJC) codes

  • Computer Graphics and Computer-Aided Design
  • Biomedical Engineering

Fingerprint

Dive into the research topics of 'Boosting text extraction from biomedical images using text region detection'. Together they form a unique fingerprint.

Cite this