Article ID Journal Published Year Pages File Type
489558 Procedia Computer Science 2015 10 Pages PDF
Abstract

This paper presents a general, simple, yet effective method for weakly supervised sentiment classification in resource-poor lan- guages. Given as input weak training signals in forms of textual reviews and associated ratings, which are available in many e-commerce websites, our method computes class distributions for sentences using the statistical information of n-grams in the reviews. These distributions can then be used directly to build sentiment classifiers in unsupervised settings, or they can be used as extra features to boost the classification accuracy in semi-supervised settings. We empirically verified the effectiveness of the proposed method on two datasets in Japanese and Vietnamese languages. The results are promising, showing that the method is able to make relatively accurate predictions even when no labeled data are given. In the semi-supervised settings, the method achieved from 1.8% to 4.7% relative improvement over the pure supervised baseline method, depending on the amount of labeled data.

Related Topics
Physical Sciences and Engineering Computer Science Computer Science (General)