Motivation: Diagrams embedded in the biomedical literature convey rich contents, which often concisely and intuitively highlight key thesis of a research article. Despite their vital importance and informative clues for biomedical literature navigation and retrieval; currently, we miss an effective computational method for automatically understanding and accessing these valuable resources. ProposedMethod: To address the aforementioned gap,we propose a novel context-based algorithm for estimating the similarity between a pair of biomedical diagrams. The main difference of the proposed algorithmwith respect to the existingmethods lies in the new algorithm's incorporation of the semantic context associated with diagrams in their source documents into the diagram similarity estimation process. In addition, the new approach also performs a series of advanced image processing and text mining operations to comprehensively extract the semantic content graphically encoded inside diagram images. Results: The new algorithm can be deployed as a reusable component providing a fundamental function for building many advanced, semantic-aware applications on biomedical diagram processing. As a case study, in our experiments, we demonstrate the advantage of the new algorithm for diagram retrieval. A set of biomedical diagram search and ranking experiments were conducted, where the performance of the new method was compared with that of five peer methods. The comparison results demonstrate the performance superiority of the new algorithm with all peer methods with statistical significance.
All Science Journal Classification (ASJC) codes
- Statistics and Probability
- Molecular Biology
- Computer Science Applications
- Computational Theory and Mathematics
- Computational Mathematics