کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
10145943 1646379 2019 23 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
An analysis of hierarchical text classification using word embeddings
ترجمه فارسی عنوان
تجزیه و تحلیل طبقه بندی متن سلسله مراتبی با استفاده از واژه های جادویی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی
Efficient distributed numerical word representation models (word embeddings) combined with modern machine learning algorithms have recently yielded considerable improvement on automatic document classification tasks. However, the effectiveness of such techniques has not been assessed for the hierarchical text classification (HTC) yet. This study investigates the application of those models and algorithms on this specific problem by means of experimentation and analysis. We trained classification models with prominent machine learning algorithm implementations-fastText, XGBoost, SVM, and Keras' CNN-and noticeable word embeddings generation methods-GloVe, word2vec, and fastText-with publicly available data and evaluated them with measures specifically appropriate for the hierarchical context. FastText achieved an lcaF1 of 0.893 on a single-labeled version of the RCV1 dataset. An analysis indicates that using word embeddings and its flavors is a very promising approach for HTC.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Sciences - Volume 471, January 2019, Pages 216-232
نویسندگان
, , ,