Adaptive document block segmentation and classification

Frank Y. Shih, Shy Shyan Chen

Research output: Contribution to journalArticlepeer-review

49 Scopus citations

Abstract

This correspondence presents an adaptive block segmentation and classification technique for daily-received office documents having complex layout structures such as multiple columns and mixed-mode contents of text, graphics, and pictures. First, an improved two-step block segmentation algorithm is performed based on run-length smoothing for decomposing any document into single-mode blocks. Then, a rule-based block classification is used for classifying each block into the text, horizontal/vertical line, graphics, or picture type. The document features and rules used are independent of character font and size and the scanning resolution. Experimental results show that our algorithms are capable of correctly segmenting and classifying different types of mixed-mode printed documents.

Original languageEnglish (US)
Pages (from-to)797-802
Number of pages6
JournalIEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Volume26
Issue number5
DOIs
StatePublished - 1996

All Science Journal Classification (ASJC) codes

  • Control and Systems Engineering
  • Software
  • Information Systems
  • Human-Computer Interaction
  • Computer Science Applications
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Adaptive document block segmentation and classification'. Together they form a unique fingerprint.

Cite this