کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
495373 862825 2014 7 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Preprocessing noisy imbalanced datasets using SMOTE enhanced with fuzzy rough prototype selection
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
پیش نمایش صفحه اول مقاله
Preprocessing noisy imbalanced datasets using SMOTE enhanced with fuzzy rough prototype selection
چکیده انگلیسی


• SMOTE is a widely used preprocessing technique for imbalanced data.
• SMOTE performs badly in the presence of class noise.
• Existing techniques focus on cleaning the data after applying SMOTE.
• We clean the data before and after applying SMOTE using fuzzy rough set theory.
• Tests on real datasets with class noise show the good performance of our method.

The Synthetic Minority Over Sampling TEchnique (SMOTE) is a widely used technique to balance imbalanced data. In this paper we focus on improving SMOTE in the presence of class noise. Many improvements of SMOTE have been proposed, mostly cleaning or improving the data after applying SMOTE. Our approach differs from these approaches by the fact that it cleans the data before applying SMOTE, such that the quality of the generated instances is better. After applying SMOTE we also carry out data cleaning, such that instances (original or introduced by SMOTE) that badly fit in the new dataset are also removed. To this goal we propose two prototype selection techniques both based on fuzzy rough set theory. The first fuzzy rough prototype selection algorithm removes noisy instances from the imbalanced dataset, the second cleans the data generated by SMOTE. An experimental evaluation shows that our method improves existing preprocessing methods for imbalanced classification, especially in the presence of noise.

Figure optionsDownload as PowerPoint slide

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Applied Soft Computing - Volume 22, September 2014, Pages 511–517
نویسندگان
, , , ,