کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
533983 870201 2013 9 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Handling imbalanced data sets with synthetic boundary data generation using bootstrap re-sampling and AdaBoost techniques
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر چشم انداز کامپیوتر و تشخیص الگو
پیش نمایش صفحه اول مقاله
Handling imbalanced data sets with synthetic boundary data generation using bootstrap re-sampling and AdaBoost techniques
چکیده انگلیسی


• The location of separating function in each class is defined by the boundary data.
• The synthetic boundary data are generated by bootstrapping.
• All new synthesized data were classified by AdaBoost neural network.
• The proposed method achieves higher accuracy in both minority and majority classes.

The problem of imbalanced data between classes prevails in various applications such as bioinformatics. The correctness of prediction in case of imbalanced data is usually biased towards the majority class. However, in several applications, the accuracy of prediction in minority class is also significant as much as in majority class. Previously, there were many techniques proposed to increase the accuracy in minority class. These techniques are based on the concept of re-sampling, which can be over-sampling and under-sampling, during the training process. Those re-sampling techniques did not considered how the data are scattered in the space. In this paper, we proposed a new technique based on the fact that the location of separating function in between any two sub-clusters in different classes is defined only by the boundary data of each sub-cluster. In addition, the accuracy is measured only by the testing set. Our technique adapted the concept of bootstrapping to estimate new region of each sub-cluster and synthesize the new boundary data. The new region is for coping with the unseen testing data. All new synthesized data were classified by using the concept of AdaBoost algorithm. Our results outperformed the other techniques under several performance evaluating functions.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Pattern Recognition Letters - Volume 34, Issue 12, 1 September 2013, Pages 1339–1347
نویسندگان
, ,