کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
533971 870197 2016 7 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A hybrid feature selection method based on instance learning and cooperative subset search
ترجمه فارسی عنوان
یک روش انتخاب ترکیبی بر اساس نمونه یادگیری و جستجوی زیرمجموعه مشارکتی
کلمات کلیدی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر چشم انداز کامپیوتر و تشخیص الگو
چکیده انگلیسی


• A hybrid feature selection method is proposed for classification in small sample size data sets.
• The filter step is based on instance learning taking advantage of the small sample size of data.
• A few candidate feature subsets are generated since their number corresponds to the number of instances.
• Cooperative feature subset search is proposed with a classifier algorithm for the wrapper step.
• The proposed method improves classification accuracy and stability of feature selection.

The problem of selecting the most useful features from thousands of candidates in a low sample size data set arises in many areas of modern sciences. Feature subset selection is a key problem in such data mining classification tasks. In practice, it is very common to use filter methods. However, they ignore the correlations between genes which are prevalent in gene expression data. On the other hand, standard wrapper algorithms cannot be applied because of their complexity. Additionally, existing methods are not specially conceived to handle the small sample size of the data which is one of the main causes of feature selection instability. In order to deal with these issues, we propose a new hybrid, filter wrapper, approach based on instance learning. Its main challenge is that it converts the problem of the small sample size to a tool that allows choosing only a few subsets of features in a filter step. A cooperative subset search, CSS, is then proposed with a classifier algorithm to represent an evaluation system of wrappers. Our method is experimentally tested and compared with state-of-the-art algorithms based on several high-dimensional low sample size cancer datasets. Results show that our proposed approach outperforms other methods in terms of accuracy and stability of the selected subset.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Pattern Recognition Letters - Volume 69, 1 January 2016, Pages 28–34
نویسندگان
, ,