کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
406393 678081 2015 15 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Improving classification rate constrained to imbalanced data between overlapped and non-overlapped regions by hybrid algorithms
ترجمه فارسی عنوان
بهبود سرعت طبقه بندی محدود به داده های عدم تعادل بین مناطق همپوشانی و غیر همپوشانی با استفاده از الگوریتم های ترکیبی
کلمات کلیدی
داده های بسیار نامتعادل با منطقه همپوشانی، الگوریتم های ترکیبی نرم، طبقه بندی نقشه برداری منطقه پاسخگو
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی

A new aspect of imbalanced data classification was studied. Unlike the classical imbalanced data classification where the cause of problem is due to the difference of data sizes, our study concerns only the situation when there exists an overlap between two classes. When one class overlaps another class, there are three regions induced from the overlap. The first region is the overlapped region between two classes. The rest is the non-overlapped region of each class. The imbalance situation is obviously caused by the different amount of data at the overlapped region and non-overlapped region. In this situation, the difference of data sizes from different classes is not the main concern and has no effect on the accuracy of classification. In this research, a combined technique, called Soft-Hybrid algorithm, was proposed for improving classification performance. The technique was divided into two main phases: boundary region determination and responsive classification algorithms for each sub-area. In the first phase, data were grouped as (1) non-overlapping data, (2) borderline data, and (3) overlapping data. Learning data using modified Hausdorff Distance, Radial Basis Function Network and K-Means clustering technique with Mahalanobis Distance. Then, modified Kernel Learning Method, modified DBSCAN and RBF network were applied to classify the data into proper groups based on statistical values from the classification phase. Finally, the results of all techniques were combined. The experimental results illustrated that the proposed method can significantly improve the effectiveness in classifying imbalanced data having large overlapping sections based on TP rate, F-measure and G-mean measures. Moreover, the computational times of the proposed method were lower than the standard algorithms used for this type of this problem.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Neurocomputing - Volume 152, 25 March 2015, Pages 429–443
نویسندگان
, , , ,