کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4947098 1439565 2017 14 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
SOM-based partial labeling of imbalanced data stream
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
SOM-based partial labeling of imbalanced data stream
چکیده انگلیسی
Data streams are found in many large-scale systems such as security, finance, and internet. In many of the data streams, the class distribution is imbalanced, and hence most of the traditional classification modeler fails to produce high accuracy for the samples from minority class. In addition, data streams are changing and the model should be updated to maintain the classification performance. However, obtaining the true class labels of the data samples is not an easy task since labeling process is extremely time-consuming and very often class labels are not available immediately after classification. The goal of this research is to reduce the labeling for an imbalanced data stream, and to produce high classification performance when compared to fully labeling setting. In an imbalanced data stream, the challenging part is to find and label minority class samples. In this paper, we propose RLS-SOM (Reduced labeled Samples-Self Organizing Map) framework for classification of the imbalanced data stream in a non-stationary environment. RLS-SOM locates the minority class samples in the feature space using SOM. It maintains an ensemble of the classifiers and builds a new model when the changes occur, using only partial labeled samples. In RLS-SOM, the classification results are obtained from the ensemble, as well as each individual model in the ensemble. An individual model classification results are selected over ensemble results, if its performance is higher than the ensemble's performance. This comparison is performed to improve the performance as there may be one model in the ensemble that produces higher performance than the ensemble. Our experimental results demonstrate that RLS-SOM obtains higher performance when it is compared with several partially labeling techniques over benchmark data sets. In addition, the experimental results with other state of the art fully labeling methods such as UCB, SERA, SEA, and Learn++.CDS shows RLS-SOM maintains equivalent classification performance by using 10-30% labeling, on average.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Neurocomputing - Volume 262, 1 November 2017, Pages 120-133
نویسندگان
, ,