کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
380501 1437442 2015 10 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A novel hybrid undersampling method for mining unbalanced datasets in banking and insurance
ترجمه فارسی عنوان
یک روش غربالگری ترکیبی جدید برای مجموعه داده های معدن ناپایدار در بانکداری و بیمه
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی

In this paper, we propose a novel hybrid approach for rectifying the data imbalance problem by employing k Reverse Nearest Neighborhood and One Class support vector machine (OCSVM) in tandem. We mined an Automobile Insurance Fraud detection dataset and customer Credit Card Churn prediction dataset to demonstrate the effectiveness of the proposed model. Throughout the paper, we followed 10 fold cross validation method of testing using Decision Tree (DT), Support Vector Machine (SVM), Logistic Regression (LR), Probabilistic Neural Network (PNN), Group Method of Data Handling (GMDH), Multi-Layer Perceptron (MLP). We observed that DT and SVM respectively yielded high sensitivity of 90.74% and 91.89% on Insurance dataset and DT, SVM and GMDH respectively produced high sensitivity of 91.2%, 87.7%, and 83.1% on Credit Card Churn Prediction dataset. In the case of Insurance Fraud detection dataset, we found that statistically there is no significant difference between DT (J48) and SVM. As DT yields “if then” rules, we prefer DT over SVM. Further, in the case of churn prediction dataset, it turned out that GMDH, SVM and LR are not statistically different and GMDH yielded very high Area Under Curve at ROC. Further, DT yielded just 4 ‘if–then’ rules on Insurance and 10 rules on churn prediction datasets, which is the significant outcome of the study.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Engineering Applications of Artificial Intelligence - Volume 37, January 2015, Pages 368–377
نویسندگان
, ,