کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4970085 1450026 2017 10 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Redundancy-driven modified Tomek-link based undersampling: A solution to class imbalance
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر چشم انداز کامپیوتر و تشخیص الگو
پیش نمایش صفحه اول مقاله
Redundancy-driven modified Tomek-link based undersampling: A solution to class imbalance
چکیده انگلیسی
Class imbalance can be defined as a span among data mining, machine learning and pattern recognition domains that provides to learn from a data-space having unequal class distribution. Common classifiers when trained by imbalanced data tend to bias towards the class possessing bulk instances causing misclassification of upcoming patterns/instances. The study reveals that presence of redundant borderline instances and outliers in the data-space severely catalyzes the effect of class imbalance. The Condensed Nearest Neighbor and Tomek-link undersampling techniques are used as the baseline systems for the present study, and an improved undersampling algorithm is proposed to be employed in the pre-processing stage by amalgamating aspects of outlier and redundancy detection to the baseline system. The proposed scheme imparts to detect outlier, redundant and noisy instances having least contribution in estimating accurate class labels. Thus, a data-level solution has been offered to the concerned problem with novelty in effective elimination of majority instances without losing valuable information. The proposed scheme is implemented and validated with Back Propagation Neural Network (BPNN), K-Nearest-Neighbor (K-NN), Support Vector Machine (SVM) and Naive Bayes classifiers for 10 real-life datasets. The experimental results obtained clearly manifest the superiority of the proposed scheme over the baseline schemes.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Pattern Recognition Letters - Volume 93, 1 July 2017, Pages 3-12
نویسندگان
, , ,