کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4948132 1439609 2016 26 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Probabilistic neural network based categorical data imputation
ترجمه فارسی عنوان
محاسبه داده های طبقه بندی متمرکز بر شبکه های احتمالی شبکه عصبی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی
Real world datasets contain both numerical and categorical attributes. Very often missing values are present in both numerical and categorical attributes. The missing data has to be imputed as the inferences made from complete data are often more accurate and reliable than those made from incomplete data [15]. Also, most of the data mining algorithms cannot work with incomplete datasets. The paper proposes a novel soft computing architecture for categorical data imputation. The proposed imputation technique employs Probabilistic Neural Network (PNN) preceded by mode for imputing the missing categorical data. The effectiveness of the proposed imputation technique is tested on 4 benchmark datasets under the 10 fold-cross validation framework. In all datasets, except Mushroom, which are complete, some values, which are randomly removed, are treated as missing values. The performance of the proposed imputation technique is compared with that of 3 statistical and 3 machine learning methods for data imputation. The comparison of the mode+PNN imputation technique with mode, K-Nearest Neighbor (K-NN), Hot Deck (HD), Naive Bayes, Random Forest (RF) and J48 (Decision Tree) imputation techniques demonstrates that the proposed method is efficient, especially when the percentage of missing values is high, for records having more than one missing value and for records having a large number of categories for each categorical variable.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Neurocomputing - Volume 218, 19 December 2016, Pages 17-25
نویسندگان
, ,