Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
865553 | Tsinghua Science & Technology | 2009 | 12 Pages |
Abstract
This paper presents an effective keyword search method for data-centric extensive markup language (XML) documents. The method divides an XML document into compact connected integral subtrees, called self-integral trees (SI-Trees), to capture the structural information in the XML document. The SI-Trees are generated based on a schema guide. Meaningful self-integral trees (MSI-Trees) are identified, which contain all or some of the input keywords for the keyword search in the XML documents. Indexing is used to accelerate the retrieval of MSI-Trees related to the input keywords. The MSI-Trees are ranked to identify the top-k results with the highest ranks. Extensive tests demonstrate that this method costs 10-100 ms to answer a keyword query, and outperforms existing approaches by 1-2 orders of magnitude.
Related Topics
Physical Sciences and Engineering
Engineering
Engineering (General)
Authors
Li (æå½è¯), Feng (å¯å»ºå), Zhou (å¨ç«æ±),