کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
6868321 1439957 2018 32 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Classification Performance Improvement Using Random Subset Feature Selection Algorithm for Data Mining
ترجمه فارسی عنوان
بهبود عملکرد طبقه بندی با استفاده از الگوریتم انتخاب ویژگی انتخاب رشته برای داده کاوی
کلمات کلیدی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
چکیده انگلیسی
This study focuses on feature subset selection from high dimensionality databases and presents modification to the existing Random Subset Feature Selection (RSFS) algorithm for the random selection of feature subsets and for improving stability. A standard k-nearest-neighbor (kNN) classifier is used for classification. The RSFS algorithm is used for reducing the dimensionality of a data set by selecting useful novel features. It is based on the random forest algorithm. The current implementation suffers from poor dimensionality reduction and low stability when the database is very large. In this study, an attempt is made to improve the existing algorithm's performance for dimensionality reduction and increase its stability. The proposed algorithm was applied to scientific data to test its performance. With 10 fold cross-validation and modifying the algorithm classification accuracy is improved. The applications of the improved algorithm are presented and discussed in detail. From the results it is concluded that the improved algorithm is superior in reducing the dimensionality and improving the classification accuracy when used with a simple kNN classifier. The data sets are selected from public repository. The datasets are scientific in nature and mostly used in cancer detection. From the results it is concluded that the algorithm is highly recommended for dimensionality reduction while extracting relevant data from scientific datasets.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Big Data Research - Volume 12, July 2018, Pages 1-12
نویسندگان
, ,