Article ID Journal Published Year Pages File Type
4947098 Neurocomputing 2017 14 Pages PDF
Abstract
Data streams are found in many large-scale systems such as security, finance, and internet. In many of the data streams, the class distribution is imbalanced, and hence most of the traditional classification modeler fails to produce high accuracy for the samples from minority class. In addition, data streams are changing and the model should be updated to maintain the classification performance. However, obtaining the true class labels of the data samples is not an easy task since labeling process is extremely time-consuming and very often class labels are not available immediately after classification. The goal of this research is to reduce the labeling for an imbalanced data stream, and to produce high classification performance when compared to fully labeling setting. In an imbalanced data stream, the challenging part is to find and label minority class samples. In this paper, we propose RLS-SOM (Reduced labeled Samples-Self Organizing Map) framework for classification of the imbalanced data stream in a non-stationary environment. RLS-SOM locates the minority class samples in the feature space using SOM. It maintains an ensemble of the classifiers and builds a new model when the changes occur, using only partial labeled samples. In RLS-SOM, the classification results are obtained from the ensemble, as well as each individual model in the ensemble. An individual model classification results are selected over ensemble results, if its performance is higher than the ensemble's performance. This comparison is performed to improve the performance as there may be one model in the ensemble that produces higher performance than the ensemble. Our experimental results demonstrate that RLS-SOM obtains higher performance when it is compared with several partially labeling techniques over benchmark data sets. In addition, the experimental results with other state of the art fully labeling methods such as UCB, SERA, SEA, and Learn++.CDS shows RLS-SOM maintains equivalent classification performance by using 10-30% labeling, on average.
Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, ,