کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
6854943 1437600 2018 46 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A domain transferable lexicon set for Twitter sentiment analysis using a supervised machine learning approach
ترجمه فارسی عنوان
یک واژگان قابل انتقال برای سنجش احساسات توییتر با استفاده از روش یادگیری دستگاه نظارت شده
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی
The Twitter messaging service has become a platform for customers and news consumers to express sentiments. Accurately capturing these sentiments has been challenging for researchers. The traditional approaches to Twitter Sentiment Analysis (TSA) include dictionary-based and use supervised machine learning tools for sentiment classification. This research follows the supervised machine learning approach. A major challenge for the machine learning approach is feature selection, which is often domain dependent. We address this specific challenge and present a novel approach to identify a lexicon set unique to TSA. We show that this Twitter Specific Lexicon Set (TSLS) is small, and most importantly, is domain transferable. This identification process generates a collection of vectorized tweets for input to machine learning tools. In traditional approaches, this vectorization often results in a highly sparse input matrix which produces low accuracy measures. In this research, we hierarchically reduce the feature set to a small set of seven “meta features” to reduce sparsity. We show that TSA based on these features can produce highly accurate results using a dynamic architecture for neural networks (DAN2) and SVM (machine learning tools) as measured by recall, precision, and F1 metrics (the harmonic average of precision and recall). Our results show that a Twitter Generic Feature Set (TGFS) derived from two datasets (@JustinBieber and @Starbucks) is domain transferable and when combined with only a few Twitter Domain Specific Features (TDSF) (less than 3%), can produce excellent sentiment classification values. We evaluate the effectiveness and transferability of the TGFS across three new and distinct domains (@GovChristie, @SouthwestAir, and @VerizonWireless).
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Expert Systems with Applications - Volume 106, 15 September 2018, Pages 197-216
نویسندگان
, ,