کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
396693 670552 2015 13 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
An effective candidate generation method for improving performance of edit similarity query processing
ترجمه فارسی عنوان
یک روش تولید کاندیدای موثر برای بهبود عملکرد پردازش پرس و جو ویرایش صریح
کلمات کلیدی
پردازش پرسشی مشابه جستجوی مشابه ویرایش فاصله، نسل کاندید
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی


• We develop an intersection based candidate generation scheme.
• We prove that candidates can be generated using a DNF of inverted lists.
• We propose a dynamic programming algorithm to select an optimal generation plan.
• We experimentally show that the proposed technique outperforms existing techniques.

In this paper, we study edit similarity query processing to find strings similar to a query string from a collection of strings. To solve the problem, many algorithms have been proposed under a filter-and-verification framework, where candidate strings are generated and refined using a few filters and then verified to find true matches. A major focus of those algorithms has been on generating candidates as small as possible in an early stage of the query processing. A typical approach to generate candidates is to extract some signatures from a query and take union of string ids in the inverted lists of the extracted signatures. However, the number of candidates generated from existing techniques is extremely larger than the number of answer strings and costs for refinement and verification are expensive. To address the problem, we propose an intersection-based candidate generation scheme, which generates a substantially smaller number of candidates. Given some signatures of a query, the proposed scheme first categorizes signatures into several groups. Then, it takes intersection of string ids in the inverted lists of the signatures in each group. Finally, it takes union of the intersections to generate candidates. To minimize the number of candidates under our scheme, we propose a novel algorithm which judiciously selects an optimal signature group. We show through experiments that our technique is very effective in reducing the number of candidates and significantly improves the performance.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Systems - Volume 47, January 2015, Pages 116–128
نویسندگان
,