کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
10326425 678070 2016 30 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Unsupervised feature selection through Gram-Schmidt orthogonalization-A word co-occurrence perspective
ترجمه فارسی عنوان
انتخاب ویژگی های غیرقابل کنترل از طریق ارتقایی گرام-اشمیت - یک دیدگاه همپوشانی کلمه
کلمات کلیدی
انتخاب ویژگی، طرح ریزی تصادفی، تقسیم بندی گرماسیمد، ویژگیهای پایه، ماتریس هماهنگی ورد،
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی
Feature selection is a key step in many machine learning applications, such as categorization, and clustering. Especially for text data, the original document-term matrix is high-dimensional and sparse, which affects the performance of feature selection algorithms. Meanwhile, labeling training instance is time-consuming and expensive. So unsupervised feature selection algorithms have attracted more attention. In this paper, we propose an unsupervised feature selection algorithm through R̲ andom P̲ rojection and G̲ ram-G̲ chmidt O̲ rthogonalization (RP-GSO) from the word co-occurrence matrix. The RP-GSO algorithm has three advantages: (1) it takes as input dense word co-occurrence matrix, avoiding the sparseness of original document-term matrix; (2) it selects “basis features” by Gram-Schmidt process, guaranteeing the orthogonalization of feature space; and (3) it adopts random projection to speed up GS process. Extensive experimental results show our proposed RP-GSO approach achieves better performance comparing against supervised and unsupervised feature selection methods in text classification and clustering tasks.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Neurocomputing - Volume 173, Part 3, 15 January 2016, Pages 845-854
نویسندگان
, , , , ,