کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
393936 665712 2014 19 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
An effective discretization method for disposing high-dimensional data
ترجمه فارسی عنوان
یک روش موثر برای دور زدن اطلاعات با ابعاد بزرگ
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی

Feature discretization is an extremely important preprocessing task used for classification in data mining and machine learning as many classification methods require that each dimension of the training dataset contains only discrete values. Most of discretization methods mainly concentrate on discretizing low-dimensional data. In this paper, we focus on discretizing high-dimensional data that frequently present the nonlinear structures. Firstly, we present a novel supervised dimension reduction algorithm to map high-dimensional data into a low-dimensional space, which ensures to keep intrinsic correlation structure of the original data. This algorithm overcomes the deficiency that the geometric topology of the data is easily distorted when mapping data that present an uneven distribution in high-dimensional space. To the best of our knowledge, this is the first approach to solve high-dimensional nonlinear data discretization with a dimension reduction technique. Secondly, we propose a supervised area-based chi-square discretization algorithm to effectively discretize each continuous dimension in the low-dimensional space. This algorithm overcomes the deficiency that existing methods do not consider the possibility of being merged for each interval pair from the view of probability. Finally, we conduct the experiments to evaluate the performance of the proposed method. The results show that our method achieves higher classification accuracy and yields a more concise knowledge of the data especially for high-dimensional datasets than existing discretization methods. In addition, our discretization method has also been successfully applied to computer vision and image classification.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Sciences - Volume 270, 20 June 2014, Pages 73–91
نویسندگان
, , , , , ,