کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
461749 696628 2012 12 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Nearest neighbor selection for iteratively kNN imputation
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر شبکه های کامپیوتری و ارتباطات
پیش نمایش صفحه اول مقاله
Nearest neighbor selection for iteratively kNN imputation
چکیده انگلیسی

Existing kNN imputation methods for dealing with missing data are designed according to Minkowski distance or its variants, and have been shown to be generally efficient for numerical variables (features, or attributes). To deal with heterogeneous (i.e., mixed-attributes) data, we propose a novel kNN (k nearest neighbor) imputation method to iteratively imputing missing data, named GkNN (gray kNN) imputation. GkNN selects k nearest neighbors for each missing datum via calculating the gray distance between the missing datum and all the training data rather than traditional distance metric methods, such as Euclidean distance. Such a distance metric can deal with both numerical and categorical attributes. For achieving the better effectiveness, GkNN regards all the imputed instances (i.e., the missing data been imputed) as observed data, which with complete instances (instances without missing values) together to iteratively impute other missing data. We experimentally evaluate the proposed approach, and demonstrate that the gray distance is much better than the Minkowski distance at both capturing the proximity relationship (or nearness) of two instances and dealing with mixed attributes. Moreover, experimental results also show that the GkNN algorithm is much more efficient than existent kNN imputation methods.


► A nearest neighbor selection is proposed for iteratively kNN imputation of missing data, named GkNN (gray kNN) imputation.
► The GkNN utilizes all the imputed instances as observed data with complete instances (instances without missing values) together for consequent imputation iteration.
► The GkNN algorithm is extended for imputing heterogeneous datasets that are with both numerical and categorical attributes.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Systems and Software - Volume 85, Issue 11, November 2012, Pages 2541–2552
نویسندگان
,