This paper proposes a new approach to modeling documents in a personal office environment. The proposed document model (called the D_model) combines the relational and object-oriented paradigms and adopts a very natural view for describing the office documents. Documents are grouped into classes. Each class is characterized by a frame template, which describes the type for the class of documents. A frame template is instantiated by providing it with values to form a frame instance, representing a synopsis of a particular document associated with the template. Based on the nature of their contents, different frame instances can be grouped into a folder. Thus, a folder is a set of frame instances which may or may not be associated with the same template. The D_model describes documents using dual hierarchies: a document type hierarchy, depicting the structural organization of the documents, and a folder organization, representing the user's logical file structure. The document type hierarchy exploits structural commonalities between frame templates. Such a hierarchy helps to classify various documents. The folder organization mimics the user's real-world document filing system and provides the user with an intuitively clear view of his/her file structure. Such a view facilitates document retrieval and filing activities. We also present an algebra (called the D_algebra) for manipulating frame instances contained in folders. In contrast to existing algebraic languages, the D_algebra provides operators for manipulating heterogeneous sets (i.e. sets with elements of different types). The proposed document model and algebraic language have been implemented as part of TEXPROS, a personal document processing system currently running in our laboratory.
All Science Journal Classification (ASJC) codes
- Information Systems and Management
- Data modeling
- Document databases
- Office information systems
- Query algebras
- Relational and object-oriented technology