کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
388256 660921 2012 6 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Comparing alternative classifiers for database marketing: The case of imbalanced datasets
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
Comparing alternative classifiers for database marketing: The case of imbalanced datasets
چکیده انگلیسی

There are various algorithms used for binary classification where the cases are classified into one of two non-overlapping classes. The area under the receiver operating characteristic (ROC) curve is the most widely used metric to evaluate the performance of alternative binary classifiers. In this study, for the application domains where the high degree of imbalance is the main characteristic and the identification of the minority class is more important, we show that hit rate based measures are more correct to assess model performances and that they should be measured on out of time samples. We also try to identify the optimum composition of the training set. Logistic regression, neural network and CHAID algorithms are implemented for a real marketing problem of a bank and the performances are compared.


► A binary classification problem with a high degree of imbalance between the classes is undertaken.
► We show that it is better to have more examples of the majority class in the training set.
► We claim and show that hit rate based measures are much more meaningful then AUC based measures in this context.
► The findings are based on a real project made for a bank.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Expert Systems with Applications - Volume 39, Issue 1, January 2012, Pages 48–53
نویسندگان
, , ,