کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
530042 869733 2016 15 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Analyzing the oversampling of different classes and types of examples in multi-class imbalanced datasets
ترجمه فارسی عنوان
تجزیه و تحلیل نمونه برداری بیش از حد از کلاس های مختلف و انواع نمونه در چند طبقه عدم توازن مجموعه داده ها
کلمات کلیدی
فراگیری ماشین، طبقه بندی نامتعادل، عدم تعادل چند طبقه، بیش از اندازه، انواع کلاس های اقلیت
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر چشم انداز کامپیوتر و تشخیص الگو
چکیده انگلیسی


• A thorough analysis of oversampling for handling multi-class imbalanced datasets.
• Proposition to detect underlying structures and example types in considered classes.
• Smart oversampling based on extracted knowledge about imbalance distribution types.
• In-depth insight into the importance of selecting proper examples for oversampling.
• Guidelines that allow to design efficient classifiers for multi-class imbalanced data.

Canonical machine learning algorithms assume that the number of objects in the considered classes are roughly similar. However, in many real-life situations the distribution of examples is skewed since the examples of some of the classes appear much more frequently. This poses a difficulty to learning algorithms, as they will be biased towards the majority classes. In recent years many solutions have been proposed to tackle imbalanced classification, yet they mainly concentrate on binary scenarios. Multi-class imbalanced problems are far more difficult as the relationships between the classes are no longer straightforward. Additionally, one should analyze not only the imbalance ratio but also the characteristics of the objects within each class. In this paper we present a study on oversampling for multi-class imbalanced datasets that focuses on the analysis of the class characteristics. We detect subsets of specific examples in each class and fix the oversampling for each of them independently. Thus, we are able to use information about the class structure and boost the more difficult and important objects. We carry an extensive experimental analysis, which is backed-up with statistical analysis, in order to check when the preprocessing of some types of examples within a class may improve the indiscriminate preprocessing of all the examples in all the classes. The results obtained show that oversampling concrete types of examples may lead to a significant improvement over standard multi-class preprocessing that do not consider the importance of example types.

Figure optionsDownload high-quality image (201 K)Download as PowerPoint slide

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Pattern Recognition - Volume 57, September 2016, Pages 164–178
نویسندگان
, , ,