کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
6874162 1441026 2018 10 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Why locality sensitive hashing works: A practical perspective
ترجمه فارسی عنوان
چرا هشن حساس به محل کار می کند: چشم انداز عملی
کلمات کلیدی
محل حساس حساس، احتمال برخورد، به یاد بیاورید، تحلیل واریانس، الگوریتم های تصادفی،
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
چکیده انگلیسی
In practice, however, users are more interested in recall rate, i.e., the probability that a random query collides with its r-near neighbor over a fixed LSH data structure hℓk. Implicitly or explicitly, Phkℓ(r) is often misinterpreted as recall rate and used to predict the performance of LSH. This is problematic because Phkℓ(r) is actually the expectation of recall rates. Interestingly, numerous empirical studies show that, for most (if not all) real datasets and a fixed sample of random LSH data structure, the recall rate is very close to Phkℓ(r). In this paper, we provide a theoretical justification for this phenomenon. We show that (1) for random datasets the recall rate is asymptotically equal to Phkℓ(r); (2) for arbitrary datasets the variance of the recall rate is very small as long as the parameter k and ℓ are properly chosen and the size of datasets is large enough. Our analysis (1) explains why the practical performance of LSH (the recall rate) matches so well with the theoretical expectation (Phkℓ(r)); and (2) indicates that, in addition to the nice theoretical guarantee, the mechanism by which LSH data structures are constructed and the huge amount of data are also the main causes for the success of LSH in practice.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Processing Letters - Volume 136, August 2018, Pages 49-58
نویسندگان
, , , ,