کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
382267 660754 2014 16 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Prior-free rare category detection: More effective and efficient solutions
ترجمه فارسی عنوان
تشخیص رده پایین پیشین رایگان: راه حل های موثر و کارآمد
کلمات کلیدی
تشخیص رده نادر، قبل از آزاد برآورد تراکم هیستوگرام، تحلیل موجک
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی


• We define a new criterion for rare category detection (RCD) using wavelet analysis.
• Our iFRED algorithm for RCD achieves linear time complexity w.r.t. data set size N.
• Theoretical analysis and experiments verify the effectiveness of our algorithms.

Identifying statistically significant anomalies in an unlabeled data set is of key importance in many applications such as financial security and remote sensing. Rare category detection (RCD) helps address this issue by passing candidate data examples to a labeling oracle (e.g., a human expert) for labeling. A challenging task in RCD is to discover all categories without any prior information about the given data set. A few approaches have been proposed to address this issue, which are on quadratic or cubic time complexities w.r.t. the data set size N and require considerable labeling queries involving time-consuming and expensive labeling effort of a human expert. In this paper, aiming at solutions with lower time complexity and less labeling queries, we propose two prior-free (i.e., without any prior information about a given data set) RCD algorithms, namely (1) iFRED which achieves linear time complexity w.r.t. N, and (2) vFRED which substantially reduces the number of labeling queries. This is done by tabulating each dimension of the data set into bins, followed by zooming out to shrink each bin down to a position and conducting wavelet analysis on the data density function to fast locate the position (i.e., a bin) of a rare category, and zooming in the located bin to select candidate data examples for labeling. Theoretical analysis guarantees the effectiveness of our algorithms, and comprehensive experiments on both synthetic and real data sets further verify the effectiveness and efficiency.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Expert Systems with Applications - Volume 41, Issue 17, 1 December 2014, Pages 7691–7706
نویسندگان
, , , , ,