کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4960622 1446503 2017 10 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Integrating Low-rank Approximation and Word Embedding for Feature Transformation in the High-dimensional Text Classification
ترجمه فارسی عنوان
یکپارچه سازی تقریبی نزولی و جاسازی ورد برای تبدیل ویژگی در طبقه بندی متن با ابعاد بزرگ
کلمات کلیدی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر علوم کامپیوتر (عمومی)
چکیده انگلیسی

With the Bag-of-Words model, a document corpus can be originally represented by a Terms-Documents matrix. However, the high-dimensional pure Terms-Documents matrix needs transforming to a lower-dimensional semantic Concepts-Documents matrix in order to not only reduce the feature space dimension but also create more meaningful features. This paper analyzes two feature transformation (FT) models on the Terms-Documents matrix, i.e. the FT model based on Low-Rank Approximation (LRA) and the FT model based on Word Embedding (WE). Both of them have their unique strength and weakness in the text transformation. The LRA-based FT only focuses on the mathematical perspective to statistically cover the original dispersed term set of the corpus as well as possible, while the WE-based FT utilizes the available word embedding vectors to enhance the contextual content of the corpus presentation. Therefore, the combinations of the LRA-based FT and the WE-based FT, named LRAintoWE-based FT and WEintoLRA-based FT, are possibly proposed to obtain comprehensive FTs capturing appropriately both the statistical information and the contextual information. The experiment results on three benchmark datasets show that the information of the WE-based FT and the LRA-based FT can be integrated, and their integration as LRAintoWE-based FT and WEintoLRA-based FT can improve the classification performance compared with that based on only either of them.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Procedia Computer Science - Volume 112, 2017, Pages 437-446
نویسندگان
, ,