TY - GEN
T1 - Document segmentation, classification and recognition system
AU - Shih, Frank Y.
AU - Chen, Shy Shyan
AU - Hung, D. C.Douglas
AU - Ng, Peter A.
PY - 1992
Y1 - 1992
N2 - This paper proposes a document segmentation, classification and recognition system for automatically reading daily-received office documents that have complex layout structures, such as multiple columns and mixed-mode contents of texts, graphics and half-one pictures. First, the block segmentation employs a two-step run-length smoothing algorithm for decomposing any document into single-mode blocks. Next, based on clustering rules the block classification classifies each block into one of text, horizontal or vertical lines, graphics, and pictures. The text block is separated into isolated characters using projection profiles, and which are translated into ASCII codes through a font- and size-independent character recognition subsystem. Logo pictures discriminated from half-tone pictures are identified and converted into symbolic words. The experimental results show that the proposed system is capable of correctly reading different styles of mixed-mode printed documents.
AB - This paper proposes a document segmentation, classification and recognition system for automatically reading daily-received office documents that have complex layout structures, such as multiple columns and mixed-mode contents of texts, graphics and half-one pictures. First, the block segmentation employs a two-step run-length smoothing algorithm for decomposing any document into single-mode blocks. Next, based on clustering rules the block classification classifies each block into one of text, horizontal or vertical lines, graphics, and pictures. The text block is separated into isolated characters using projection profiles, and which are translated into ASCII codes through a font- and size-independent character recognition subsystem. Logo pictures discriminated from half-tone pictures are identified and converted into symbolic words. The experimental results show that the proposed system is capable of correctly reading different styles of mixed-mode printed documents.
UR - http://www.scopus.com/inward/record.url?scp=0026962292&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0026962292&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:0026962292
SN - 0818626976
T3 - Proceedings of the Second International Conference on Systems Integration
SP - 258
EP - 267
BT - Proceedings of the Second International Conference on Systems Integration
PB - Publ by IEEE
T2 - Proceedings of the Second International Conference on Systems Integration - ICSI'92
Y2 - 15 June 1992 through 18 June 1992
ER -