کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
533356 870109 2012 13 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
An unsupervised approach to feature discretization and selection
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر چشم انداز کامپیوتر و تشخیص الگو
پیش نمایش صفحه اول مقاله
An unsupervised approach to feature discretization and selection
چکیده انگلیسی

Many learning problems require handling high dimensional datasets with a relatively small number of instances. Learning algorithms are thus confronted with the curse of dimensionality, and need to address it in order to be effective. Examples of these types of data include the bag-of-words representation in text classification problems and gene expression data for tumor detection/classification. Usually, among the high number of features characterizing the instances, many may be irrelevant (or even detrimental) for the learning tasks. It is thus clear that there is a need for adequate techniques for feature representation, reduction, and selection, to improve both the classification accuracy and the memory requirements. In this paper, we propose combined unsupervised feature discretization and feature selection techniques, suitable for medium and high-dimensional datasets. The experimental results on several standard datasets, with both sparse and dense features, show the efficiency of the proposed techniques as well as improvements over previous related techniques.


► New feature discretization/selection techniques for high-dimensional data.
► Discretization, based on the Linde–Buzo–Gray algorithm, is combined with FS algorithms.
► New (log-linear-time) FS filter approach, efficient for high-dimensional data.
► Excellent classification results on different types of data (e.g., microarray).
► Efficient and generic criterion to choose an adequate number of features.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Pattern Recognition - Volume 45, Issue 9, September 2012, Pages 3048–3060
نویسندگان
, ,