کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4943374 1437625 2017 32 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A case study of Spanish text transformations for twitter sentiment analysis
ترجمه فارسی عنوان
مطالعه موردی تحولات متن اسپانیایی برای تجزیه و تحلیل احساسات توییتر
کلمات کلیدی
تجزیه و تحلیل احساسات، اظهارات متن خطا، نظر معادن،
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی
The aim of this research is to identify in a large set of combinations which text transformations (lemmatization, stemming, entity removal, among others), tokenizers (e.g., word n-grams), and token-weighting schemes make the most impact on the accuracy of a classifier (Support Vector Machine) trained on two Spanish datasets. The methodology used is to exhaustively analyze all combinations of text transformations and their respective parameters to find out what common characteristics the best performing classifiers have. Furthermore, we introduce a novel approach based on the combination of word-based n-grams and character-based q-grams. The results show that this novel combination of words and characters produces a classifier that outperforms the traditional word-based combination by 11.17% and 5.62% on the INEGI and TASS'15 dataset, respectively.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Expert Systems with Applications - Volume 81, 15 September 2017, Pages 457-471
نویسندگان
, , , , , ,