Article ID Journal Published Year Pages File Type
534200 Pattern Recognition Letters 2015 7 Pages PDF
Abstract

•Proposed a graphical model to extract a sentiment lexicon with document annotations.•Applied an active learning to extract a sentiment lexicon to reduce the annotation.•Suggested and experimented four distinct initialization methods for active learners.•Proposed lexicon coverage analysis algorithm to initialize the active learner.

Recent research indicates that a sentiment lexicon focusing on a specific domain leads to better sentiment analyses compared to a general-purpose sentiment lexicon, such as SentiWordNet. In spite of this potential improvement, the cost of building a domain-specific sentiment lexicon hinders its wider and more practical applications. To compensate for this difficulty, we propose extracting a sentiment lexicon from a domain-specific corpus by annotating an intelligently selected subset of documents in the corpus. Specifically, the subset is selected by an active learner with initializations from diverse text analytics, i.e. latent Dirichlet allocation and our proposed lexicon coverage algorithm. This active learning produces a better domain-specific sentiment lexicon which results in a higher accuracy of the sentiment classification. Subsequently, we evaluate extracted sentiment lexicons by observing (1) the increased F1 measure in sentiment classifications and (2) the increased similarity to the sentiment lexicon with the full annotation. We expect that this contribution will enable more accurate sentiment classification by domain-specific sentiment lexicons with less sentiment tagging efforts.

Related Topics
Physical Sciences and Engineering Computer Science Computer Vision and Pattern Recognition
Authors
, , ,