کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
378903 659233 2012 20 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Large scale instance selection by means of federal instance selection
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
Large scale instance selection by means of federal instance selection
چکیده انگلیسی

Instance selection is becoming more and more relevant due to the huge amount of data that is constantly being produced. However, although current algorithms are useful for fairly large datasets, many scaling problems are found when the number of instances is hundreds of thousands or millions. Most of the widely used instance selection algorithms are of complexity at least O(n2), n being the number of instances. When we face very large problems, the scalability becomes an issue, and most of the algorithms are not applicable.This paper presents a methodology for scaling up instance selection algorithms by means of a parallel procedure that performs instance selection on small subsets of the original dataset. The results obtained with the application of instance selection to small subsets are combined using a voting scheme. The method achieves a very good performance in terms of testing error and storage reduction, while the execution time of the process is decreased very significantly. The parallel algorithm also removes any kind of constraint imposed by memory size, as the whole dataset does not need to be stored in memory.The usefulness of our method is shown by an extensive comparison using 35 datasets of medium and large sizes from the UCI Machine Learning Repository. Additionally, our method is applied to eight very large datasets with very good results and fast execution time.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Data & Knowledge Engineering - Volume 75, May 2012, Pages 58–77
نویسندگان
, , ,