کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
402132 676862 2016 8 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
BitHash: An efficient bitwise Locality Sensitive Hashing method with applications
ترجمه فارسی عنوان
BitHash: یک بیتی کارآمد محل روش هش کردن حساس با برنامه های کاربردی
کلمات کلیدی
محل هش کردن حساس؛ BitHash؛ تشخیص تقریبا تکراری؛ فراگیری ماشین؛ تجزیه و تحلیل احساسات؛ بهره وری ذخیره سازی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی

Locality Sensitive Hashing has been applied to detecting near-duplicate images, videos and web documents. In this paper we present a Bitwise Locality Sensitive method by using only one bit per hash value (BitHash), the storage space for storing hash values is significantly reduced, and the estimator can be computed much faster. The method provides an unbiased estimate of pairwise Jaccard similarity, and the estimator is a linear function of Hamming distance, which is very simple. We rigorously analyze the variance of One-Bit Min-Hash (BitHash), showing that for high Jaccard similarity. BitHash may provide accurate estimation, and as the pairwise Jaccard similarity increases, the variance ratio of BitHash over the original min-hash decreases. Furthermore, BitHash compresses each data sample into a compact binary hash code while preserving the pairwise similarity of the original data. The binary code can be used as a compressed and informative representation in replacement of the original data for subsequent processing. For example, it can be naturally integrated with a classifier like SVM. We apply BitHash to two typical applications, near-duplicate image detection and sentiment analysis. Experiments on real user’s photo collection and a popular sentiment analysis data set show that, the classification accuracy of our proposed method for two applications could approach the state-of-the-art method, while BitHash only requires a significantly smaller storage space.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Knowledge-Based Systems - Volume 97, 1 April 2016, Pages 40–47
نویسندگان
, , , , , ,