Article ID Journal Published Year Pages File Type
865553 Tsinghua Science & Technology 2009 12 Pages PDF
Abstract
This paper presents an effective keyword search method for data-centric extensive markup language (XML) documents. The method divides an XML document into compact connected integral subtrees, called self-integral trees (SI-Trees), to capture the structural information in the XML document. The SI-Trees are generated based on a schema guide. Meaningful self-integral trees (MSI-Trees) are identified, which contain all or some of the input keywords for the keyword search in the XML documents. Indexing is used to accelerate the retrieval of MSI-Trees related to the input keywords. The MSI-Trees are ranked to identify the top-k results with the highest ranks. Extensive tests demonstrate that this method costs 10-100 ms to answer a keyword query, and outperforms existing approaches by 1-2 orders of magnitude.
Keywords
Related Topics
Physical Sciences and Engineering Engineering Engineering (General)
Authors
, , ,