کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
382233 660745 2016 12 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbalanced datasets
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbalanced datasets
چکیده انگلیسی


• A new oversampling method for imbalanced dataset classification is presented.
• It clusters the minority class and identifies borderline minority instances.
• Considering majority class during minority class clustering improves oversampling.
• Cluster size after oversampling should be dependent on its misclassification error.
• Generated synthetic instances improved subsequent classification.

In many applications, the dataset for classification may be highly imbalanced where most of the instances in the training set may belong to one of the classes (majority class), while only a few instances are from the other class (minority class). Conventional classifiers will strongly favor the majority class and ignore the minority instances. In this paper, we present a new oversampling method called Adaptive Semi-Unsupervised Weighted Oversampling (A-SUWO) for imbalanced binary dataset classification. The proposed method clusters the minority instances using a semi-unsupervised hierarchical clustering approach and adaptively determines the size to oversample each sub-cluster using its classification complexity and cross validation. Then, the minority instances are oversampled depending on their Euclidean distance to the majority class. A-SUWO aims to identify hard-to-learn instances by considering minority instances from each sub-cluster that are closer to the borderline. It also avoids generating synthetic minority instances that overlap with the majority class by considering the majority class in the clustering and oversampling stages. Results demonstrate that the proposed method achieves significantly better results in most datasets compared with other sampling methods.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Expert Systems with Applications - Volume 46, 15 March 2016, Pages 405–416
نویسندگان
, ,