کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
461022 696525 2015 8 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Learning to detect representative data for large scale instance selection
ترجمه فارسی عنوان
یادگیری برای شناسایی داده های نمایندگی برای انتخاب نمونه های بزرگ در مقیاس بزرگ
کلمات کلیدی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر شبکه های کامپیوتری و ارتباطات
چکیده انگلیسی


• The ReDD (Representative Data Detection) approach is introduced for large scale instance selection.
• In ReDD, a detector learns the patterns of (un)representative data after performing instance selection.
• Then, the detector is used to detect the newly added data.
• We found that ReDD can reduce the computational cost and maintain the final classification accuracy.

Instance selection is an important data pre-processing step in the knowledge discovery process. However, the dataset sizes of various domain problems are usually very large, and some are even non-stationary, composed of both old data and a large amount of new data samples. Current algorithms for solving this type of scalability problem have certain limitations, meaning they require a very high computational cost over very large scale datasets during instance selection. To this end, we introduce the ReDD (Representative Data Detection) approach, which is based on outlier pattern analysis and prediction. First, a machine learning model, or detector, is used to learn the patterns of (un)representative data selected by a specific instance selection method from a small amount of training data. Then, the detector can be used to detect the rest of the large amount of training data, or newly added data. We empirically evaluate ReDD over 50 domain datasets to examine the effectiveness of the learned detector, using four very large scale datasets for validation. The experimental results show that ReDD not only reduces the computational cost nearly two or three times by three baselines, but also maintains the final classification accuracy.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Systems and Software - Volume 106, August 2015, Pages 1–8
نویسندگان
, , , , ,