Design a batched information retrieval system based on a concept-lattice-like structure

Article ID	Journal	Published Year	Pages	File Type
6861456	Knowledge-Based Systems	2018	20 Pages	PDF

Abstract

Nowadays, as is envisioned as one of the most popular and challenging research areas due to the rapid growth of web data, information retrieval (IR) serves as a fundamental technology in large scale dataset processing and analyzing. IR systems usually involve handling massive and continuous retrieval requests in data matching, information filtering and other application scenarios. However, in the general applications of IR such as search engines, the individual response time is mostly emphasized and the efficiency of handling massive queries mainly relies on caching or similar technologies. For improving the overall efficiency of handling massive queries, we design a batched information retrieval system which first analyzes a batch of queries and then utilizes the repeats, similarity and correlations among queries to accelerate the retrievals. A concept-lattice-like structure called keyword-DAG (Directed Acyclic Graph) is first exploited to store and organize the similarity among queries. Accordingly a keyword-DAG processing algorithm namely pruning is devised to implement the batched retrieval. Then an incremental ranking algorithm is presented for the batched IR scenarios, which has be demonstrated (both in theory and practice) to be able to remarkably shorten the retrieval time. Finally, an overall planning algorithm is proposed for choosing the optimal pruning path and improving the utilization of memory. The experiment results show that our approach embraces far better performance compared with the traditional separate retrieval method in mass data processing and analyzing scenarios.

Keywords

ranking algorithm Information retrieval Caching concept lattice Information Filtering Inverted file