کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
405906 678047 2016 10 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
An efficient algorithm for distributed density-based outlier detection on big data
ترجمه فارسی عنوان
یک الگوریتم کارآمد برای تشخیص خروجی مبتنی بر تراکم توزیع شده بر روی داده های بزرگ
کلمات کلیدی
بی نظمی مبتنی بر تراکم، فاکتور بیرونی محلی، الگوریتم توزیع شده
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی

The outlier detection is a popular issue in the area of data management and multimedia analysis, and it can be used in many applications such as detection of noisy images, credit card fraud detection, network intrusion detection. The density-based outlier is an important definition of outlier, whose target is to compute a Local Outlier Factor (LOF) for each tuple in a data set to represent the degree of this tuple to be an outlier. It shows several significant advantages comparing with other existing definitions. This paper focuses on the problem of distributed density-based outlier detection for large-scale data. First, we propose a Gird-Based Partition algorithm (GBP) as a data preparation method. GBP first splits the data set into several grids, and then allocates these grids to the datanodes in a distributed environment. Second, we propose a Distributed LOF Computing method (DLC) for detecting density-based outliers in parallel, which only needs a small amount of network communications. At last, the efficiency and effectiveness of the proposed approaches are verified through a series of simulation experiments.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Neurocomputing - Volume 181, 12 March 2016, Pages 19–28
نویسندگان
, , , ,