TY - GEN
T1 - Boosting text extraction from biomedical images using text region detection
AU - Xu, Songhua
AU - Krauthammer, Michael
PY - 2011
Y1 - 2011
N2 - In this paper, we show that domain-optimized text detection in biomedical images is important for boosting text extraction recall via off-the-shelf OCR engines. Methodologically, we contrast OCR performance when processing raw biomedical images, compared to preprocessing those images, and performing OCR on detected image text regions only. To quantify OCR extraction results, we rely on a gold standard image text corpus with manually identified image text strings. To demonstrate the positive effect on biomedical image retrieval, we apply image text detection and extraction to a large corpus of biomedical images in the Yale Image Finder system. We show that improved text extraction results in the retrieval of a larger number of relevant images for a set of domain-relevant keyword searches.
AB - In this paper, we show that domain-optimized text detection in biomedical images is important for boosting text extraction recall via off-the-shelf OCR engines. Methodologically, we contrast OCR performance when processing raw biomedical images, compared to preprocessing those images, and performing OCR on detected image text regions only. To quantify OCR extraction results, we rely on a gold standard image text corpus with manually identified image text strings. To demonstrate the positive effect on biomedical image retrieval, we apply image text detection and extraction to a large corpus of biomedical images in the Yale Image Finder system. We show that improved text extraction results in the retrieval of a larger number of relevant images for a set of domain-relevant keyword searches.
UR - http://www.scopus.com/inward/record.url?scp=79959866522&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79959866522&partnerID=8YFLogxK
U2 - 10.1109/BSEC.2011.5872319
DO - 10.1109/BSEC.2011.5872319
M3 - Conference contribution
AN - SCOPUS:79959866522
SN - 9781612844107
T3 - Proceedings of the 2011 Biomedical Sciences and Engineering Conference: Image Informatics and Analytics in Biomedicine, BSEC 2011
BT - Proceedings of the 2011 Biomedical Sciences and Engineering Conference
T2 - 2011 Biomedical Sciences and Engineering Conference: Image Informatics and Analytics in Biomedicine, BSEC 2011
Y2 - 15 March 2011 through 17 March 2011
ER -