کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
393255 665586 2015 19 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Tree-based prediction on incomplete data using imputation or surrogate decisions
ترجمه فارسی عنوان
پیش بینی بر اساس درخت بر روی داده های ناقص با استفاده از تقلید یا تصمیم های جایگزین
کلمات کلیدی
پیش بینی، داده های گم شده، تصمیم قطعی، محاسبه چندگانه، درخت نتیجه گیری مشروط
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی

The goal is to investigate the prediction performance of tree-based techniques when the available training data contains features with missing values. Also the future test cases may contain missing values and thus the methods should be able to generate predictions for such test cases. The missing values are handled either by using surrogate decisions within the trees or by the combination of an imputation method with a tree-based method. Missing values generated according to missing completely at random (MCAR), missing at random (MAR) and missing not at random (MNAR) mechanisms are considered with various fractions of missing data. Imputation models are built in the learning phase and do not make use of the response variable, so that the resulting procedures allow to predict individual incomplete test cases. In the empirical comparison, both classification and regression problems are considered using a simulated and real-life datasets. The performance is evaluated by misclassification rate of predictions and mean squared prediction error, respectively. Overall, our results show that for smaller fractions of missing data an ensemble method combined with surrogates or single imputation suffices. For moderate to large fractions of missing values ensemble methods based on conditional inference trees combined with multiple imputation show the best performance, while conditional bagging using surrogates is a good alternative for high-dimensional prediction problems. Theoretical results confirm the potential better prediction performance of multiple imputation ensembles.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Sciences - Volume 311, 1 August 2015, Pages 163–181
نویسندگان
, ,