Document segmentation, classification and recognition system

Frank Y. Shih, Shy Shyan Chen, D. C.Douglas Hung, Peter A. Ng

Research output: Chapter in Book/Report/Conference proceedingConference contribution

31 Scopus citations

Abstract

This paper proposes a document segmentation, classification and recognition system for automatically reading daily-received office documents that have complex layout structures, such as multiple columns and mixed-mode contents of texts, graphics and half-one pictures. First, the block segmentation employs a two-step run-length smoothing algorithm for decomposing any document into single-mode blocks. Next, based on clustering rules the block classification classifies each block into one of text, horizontal or vertical lines, graphics, and pictures. The text block is separated into isolated characters using projection profiles, and which are translated into ASCII codes through a font- and size-independent character recognition subsystem. Logo pictures discriminated from half-tone pictures are identified and converted into symbolic words. The experimental results show that the proposed system is capable of correctly reading different styles of mixed-mode printed documents.

Original languageEnglish (US)
Title of host publicationProceedings of the Second International Conference on Systems Integration
PublisherPubl by IEEE
Pages258-267
Number of pages10
ISBN (Print)0818626976
StatePublished - 1992
EventProceedings of the Second International Conference on Systems Integration - ICSI'92 - Morristown, NJ, USA
Duration: Jun 15 1992Jun 18 1992

Publication series

NameProceedings of the Second International Conference on Systems Integration

Other

OtherProceedings of the Second International Conference on Systems Integration - ICSI'92
CityMorristown, NJ, USA
Period6/15/926/18/92

All Science Journal Classification (ASJC) codes

  • General Engineering

Fingerprint

Dive into the research topics of 'Document segmentation, classification and recognition system'. Together they form a unique fingerprint.

Cite this