کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
6861498 1439252 2018 30 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
An automated text categorization framework based on hyperparameter optimization
ترجمه فارسی عنوان
چارچوب طبقه بندی خودکار متن بر اساس بهینه سازی بیش از حد پارامتر
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی
A great variety of text tasks such as topic or spam identification, user profiling, and sentiment analysis can be posed as a supervised learning problem and tackled using a text classifier. A text classifier consists of several subprocesses, some of them are general enough to be applied to any supervised learning problem, whereas others are specifically designed to tackle a particular task using complex and computational expensive processes such as lemmatization, syntactic analysis, etc. Contrary to traditional approaches, we propose a minimalist and multi-propose text-classifier able to tackle tasks independently of domain and language. We named our approach μTC. Our approach is composed of several easy-to-implement text transformations, text representations, and a supervised learning algorithm. These pieces produce a competitive classifier in several challenging domains such as informally written text. We provide a detailed description of μTC along with an extensive experimental comparison with relevant state-of-the-art methods, i.e., μTC was compared on 30 different datasets. Regarding accuracy, μTC obtained the best performance in 20 datasets while achieves competitive results in the remaining ones. The compared datasets include several problems like topic and polarity classification, spam detection, user profiling and authorship attribution. Furthermore, our approach allows the usage of the technology even without an in-depth knowledge of machine learning and natural language processing.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Knowledge-Based Systems - Volume 149, 1 June 2018, Pages 110-123
نویسندگان
, , , ,