Article ID Journal Published Year Pages File Type
6926133 Information Processing & Management 2018 13 Pages PDF
Abstract
Recent years have seen a great deal of work on collection selection. Most collection selection methods use central sample index (CSI) that consists of some documents sampled from each collection as collection description. The limitations of these methods are the usage of 'flat' meaning representations that ignore structure and relationships among words in CSI, and the calculation of query-collection similarity metric that ignore semantic distance between query words and indexed words. In this paper, we propose a knowledge based collection selection method (KBCS) to improve collection representation and query-collection similarity metric. KBCS models a collection as a weighted entity set and applies a novel query-collection similarity metric to select highly scored collections. Specifically, in the part of collection representation, context- and structure-based measures are employed to weight the semantic distance between two entities extracted from the sampled documents of a collection. In addition, the novel query-collection similarity metric takes the entity weight, collection size, and other factors into account. To enrich concepts contained in a query, DBpedia based query expansion is integrated. Finally, extensive experiments were conducted on a large webpage dataset, and DBpedia was chosen as the graph knowledge base. Experimental results demonstrate the effectiveness of KBCS.
Related Topics
Physical Sciences and Engineering Computer Science Computer Science Applications
Authors
, , ,