A heuristic approach for Î»-representative information retrieval from large-scale data

Article ID	Journal	Published Year	Pages	File Type
6858139	Information Sciences	2014	17 Pages	PDF

Abstract

Retrieving representative information from large-scale data becomes an important research issue nowadays, especially in the context of mobile business/search where the screen size and navigability are limited. This paper focuses on certain aspects of representativeness in database queries and web search, and proposes an approach to extracting a subset of results from original search results in light of high coverage and low redundancy. In the paper, the notion of Î»-represent is introduced, which enables us to describe the Î»-represent relationship between the sets of data objects. Then, the Î»-representative problem is formulated as an extension of the typical set covering problem, which leads to developing a heuristic approach (namely, LamRep) to coping with the problem effectively and efficiently. Notably, LamRep is incorporated with a “vote” mechanism, enhanced with an algorithmic acceleration strategy. Data experiments on benchmark data and a real-world example show that LamRep outperforms the other approaches.

Keywords

Heuristic algorithm Information retrieval Web search