Tool for classifying office documents

Xiaolong Hao, Jason T. Wang, Michael P. Bieber, Peter A. Ng

Research output: Chapter in Book/Report/Conference proceedingConference contribution

10 Scopus citations

Abstract

This paper presents the design of a tool for classifying office documents. We represent a document's layout structure using an ordered labeled tree, called the 'layout structure tree' (L-S-Tree), based on a nested segmentation procedure. The tool uses a sample-based approach for learning where concepts are learned by retaining samples and new documents are classified by matching their L-S-Trees with samples. The matching process involves both computing the edit distance between two trees using a previously developed pattern matching toolkit, and calculating the degree of conceptual closeness between the documents and samples. Our experimental results show that the tool is capable of classifying various types of office documents, even with very few samples in the sample base.

Original languageEnglish (US)
Title of host publicationProceedings of the International Conference on Tools with Artificial Intelligence
Editors Anon
PublisherPubl by IEEE
Pages427-434
Number of pages8
ISBN (Print)0818642009
StatePublished - Dec 1 1993
EventProceedings of the 5th International Conference on Tools with Artificial Intelligence TAI '93 - Boston, MA, USA
Duration: Nov 8 1993Nov 11 1993

Publication series

NameProceedings of the International Conference on Tools with Artificial Intelligence
ISSN (Print)1063-6730

Other

OtherProceedings of the 5th International Conference on Tools with Artificial Intelligence TAI '93
CityBoston, MA, USA
Period11/8/9311/11/93

All Science Journal Classification (ASJC) codes

  • Software

Fingerprint Dive into the research topics of 'Tool for classifying office documents'. Together they form a unique fingerprint.

Cite this