کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
515840 867108 2014 15 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Bi-view semi-supervised active learning for cross-lingual sentiment classification
ترجمه فارسی عنوان
بی نظیر یادگیری فعال نیمه نظارتی برای طبقه بندی احساسات متقابل زبان
کلمات کلیدی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
چکیده انگلیسی


• We combine active and semi-supervised learning for cross-lingual sentiment classification.
• Density analysis of unlabeled data is used in active learning.
• We test our proposed model on three different languages.
• This model reduce manual labelling efforts in cross-lingual sentiment classification.
• Results show that incorporating density analysis can speed up learning process.

Recently, sentiment classification has received considerable attention within the natural language processing research community. However, since most recent works regarding sentiment classification have been done in the English language, there are accordingly not enough sentiment resources in other languages. Manual construction of reliable sentiment resources is a very difficult and time-consuming task. Cross-lingual sentiment classification aims to utilize annotated sentiment resources in one language (typically English) for sentiment classification of text documents in another language. Most existing research works rely on automatic machine translation services to directly project information from one language to another. However, different term distribution between original and translated text documents and translation errors are two main problems faced in the case of using only machine translation. To overcome these problems, we propose a novel learning model based on active learning and semi-supervised co-training to incorporate unlabelled data from the target language into the learning process in a bi-view framework. This model attempts to enrich training data by adding the most confident automatically-labelled examples, as well as a few of the most informative manually-labelled examples from unlabelled data in an iterative process. Further, in this model, we consider the density of unlabelled data so as to select more representative unlabelled examples in order to avoid outlier selection in active learning. The proposed model was applied to book review datasets in three different languages. Experiments showed that our model can effectively improve the cross-lingual sentiment classification performance and reduce labelling efforts in comparison with some baseline methods.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Processing & Management - Volume 50, Issue 5, September 2014, Pages 718–732
نویسندگان
, , ,