Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
6861456 | Knowledge-Based Systems | 2018 | 20 Pages |
Abstract
Nowadays, as is envisioned as one of the most popular and challenging research areas due to the rapid growth of web data, information retrieval (IR) serves as a fundamental technology in large scale dataset processing and analyzing. IR systems usually involve handling massive and continuous retrieval requests in data matching, information filtering and other application scenarios. However, in the general applications of IR such as search engines, the individual response time is mostly emphasized and the efficiency of handling massive queries mainly relies on caching or similar technologies. For improving the overall efficiency of handling massive queries, we design a batched information retrieval system which first analyzes a batch of queries and then utilizes the repeats, similarity and correlations among queries to accelerate the retrievals. A concept-lattice-like structure called keyword-DAG (Directed Acyclic Graph) is first exploited to store and organize the similarity among queries. Accordingly a keyword-DAG processing algorithm namely pruning is devised to implement the batched retrieval. Then an incremental ranking algorithm is presented for the batched IR scenarios, which has be demonstrated (both in theory and practice) to be able to remarkably shorten the retrieval time. Finally, an overall planning algorithm is proposed for choosing the optimal pruning path and improving the utilization of memory. The experiment results show that our approach embraces far better performance compared with the traditional separate retrieval method in mass data processing and analyzing scenarios.
Keywords
Related Topics
Physical Sciences and Engineering
Computer Science
Artificial Intelligence
Authors
Ming Huang, Jiajun Lin, Yong Peng, Xing Xie,