Leveraging User Ratings for Resource-poor Sentiment Classification

Article ID	Journal	Published Year	Pages	File Type
489558	Procedia Computer Science	2015	10 Pages	PDF

Abstract

This paper presents a general, simple, yet effective method for weakly supervised sentiment classification in resource-poor lan- guages. Given as input weak training signals in forms of textual reviews and associated ratings, which are available in many e-commerce websites, our method computes class distributions for sentences using the statistical information of n-grams in the reviews. These distributions can then be used directly to build sentiment classifiers in unsupervised settings, or they can be used as extra features to boost the classification accuracy in semi-supervised settings. We empirically verified the effectiveness of the proposed method on two datasets in Japanese and Vietnamese languages. The results are promising, showing that the method is able to make relatively accurate predictions even when no labeled data are given. In the semi-supervised settings, the method achieved from 1.8% to 4.7% relative improvement over the pure supervised baseline method, depending on the amount of labeled data.