TY - JOUR
T1 - A new pivoting and iterative text detection algorithm for biomedical images
AU - Xu, Songhua
AU - Krauthammer, Michael
N1 - Funding Information:
This research has been funded by NLM Grant Nos. 5K22LM009255 and 1R01LM009956 . Songhua Xu performed this research partly as a Eugene P. Wigner Fellow and staff member at the Oak Ridge National Laboratory, managed by UT-Battelle, LLC, for the US Department of Energy under Contract DE-AC05-00OR22725. We thank the authors of Ref. [21–24] for sharing their source codes or executables for conducting the performance comparison studies.
PY - 2010/12
Y1 - 2010/12
N2 - There is interest to expand the reach of literature mining to include the analysis of biomedical images, which often contain a paper's key findings. Examples include recent studies that use Optical Character Recognition (OCR) to extract image text, which is used to boost biomedical image retrieval and classification. Such studies rely on the robust identification of text elements in biomedical images, which is a non-trivial task. In this work, we introduce a new text detection algorithm for biomedical images based on iterative projection histograms. We study the effectiveness of our algorithm by evaluating the performance on a set of manually labeled random biomedical images, and compare the performance against other state-of-the-art text detection algorithms. We demonstrate that our projection histogram-based text detection approach is well suited for text detection in biomedical images, and that the iterative application of the algorithm boosts performance to an F score of .60. We provide a C++ implementation of our algorithm freely available for academic use.
AB - There is interest to expand the reach of literature mining to include the analysis of biomedical images, which often contain a paper's key findings. Examples include recent studies that use Optical Character Recognition (OCR) to extract image text, which is used to boost biomedical image retrieval and classification. Such studies rely on the robust identification of text elements in biomedical images, which is a non-trivial task. In this work, we introduce a new text detection algorithm for biomedical images based on iterative projection histograms. We study the effectiveness of our algorithm by evaluating the performance on a set of manually labeled random biomedical images, and compare the performance against other state-of-the-art text detection algorithms. We demonstrate that our projection histogram-based text detection approach is well suited for text detection in biomedical images, and that the iterative application of the algorithm boosts performance to an F score of .60. We provide a C++ implementation of our algorithm freely available for academic use.
KW - Biomedical image mining
KW - Histogram analysis for text detection
KW - Pivoting and iterative text region detection
KW - Text detection
UR - http://www.scopus.com/inward/record.url?scp=78649322126&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=78649322126&partnerID=8YFLogxK
U2 - 10.1016/j.jbi.2010.09.006
DO - 10.1016/j.jbi.2010.09.006
M3 - Article
C2 - 20887803
AN - SCOPUS:78649322126
SN - 1532-0464
VL - 43
SP - 924
EP - 931
JO - Journal of Biomedical Informatics
JF - Journal of Biomedical Informatics
IS - 6
ER -