دانلود رایگان مقاله: الگوریتم جمع آوری مبتنی بر خوشه بندی در داده های عدم تعادل کلاس

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
4944376	1437988	2017	10 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

Clustering-based undersampling in class-imbalanced data

ترجمه فارسی عنوان

الگوریتم جمع آوری مبتنی بر خوشه بندی در داده های عدم تعادل کلاس

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

عدم تعادل کلاس، داده های نامتعادل، فراگیری ماشین، خوشه بندی گروه سازنده

Clustering - خوشه بندی Imbalanced data - داده های نامتعادل Class imbalance - عدم تعادل کلاس Classifier ensembles - گروه سازنده Machine learning - یادگیری ماشین

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی

پیش نمایش مقاله

الگوریتم جمع آوری مبتنی بر خوشه بندی در داده های عدم تعادل کلاس

چکیده انگلیسی

Class imbalance is often a problem in various real-world data sets, where one class (i.e. the minority class) contains a small number of data points and the other (i.e. the majority class) contains a large number of data points. It is notably difficult to develop an effective model using current data mining and machine learning algorithms without considering data preprocessing to balance the imbalanced data sets. Random undersampling and oversampling have been used in numerous studies to ensure that the different classes contain the same number of data points. A classifier ensemble (i.e. a structure containing several classifiers) can be trained on several different balanced data sets for later classification purposes. In this paper, we introduce two undersampling strategies in which a clustering technique is used during the data preprocessing step. Specifically, the number of clusters in the majority class is set to be equal to the number of data points in the minority class. The first strategy uses the cluster centers to represent the majority class, whereas the second strategy uses the nearest neighbors of the cluster centers. A further study was conducted to examine the effect on performance of the addition or deletion of 5 to 10 cluster centers in the majority class. The experimental results obtained using 44 small-scale and 2 large-scale data sets revealed that the clustering-based undersampling approach with the second strategy outperformed five state-of-the-art approaches. Specifically, this approach combined with a single multilayer perceptron classifier and C4.5 decision tree classifier ensembles delivered optimal performance over both small- and large-scale data sets.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Sciences - Volumes 409â410, October 2017, Pages 17-26

نویسندگان

Lin Wei-Chao, Tsai Chih-Fong, Hu Ya-Han, Jhang Jing-Shang,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دانلود رایگان مقاله ISI : الگوریتم جمع آوری مبتنی بر خوشه بندی در داده های عدم تعادل کلاس

دسترسی سریع

ارتباط

English Website