Article ID Journal Published Year Pages File Type
1123316 Procedia - Social and Behavioral Sciences 2011 7 Pages PDF
Abstract

This on the web, most structured document collections consist of documents from different sources and marked up with different types of structures. The diversity of structures has lead to the emergence of heterogeneous structured documents. The heterogeneity of structured documents poses new challenges for document representation in structured document retrieval. The representation model needs to handle various types of structures as well as multiple structures in a single document. Furthermore, same information may be represented in different structures and information contained in different documents may be partial and inconsistent. Therefore, the linkage of semantically related elements in the document collections needs to be modelled in the representation model. In this paper, we introduce a generic and flexible structured document model to represent heterogeneous structured documents as well as the similar correspondences in the document collections.

Related Topics
Social Sciences and Humanities Arts and Humanities Arts and Humanities (General)