کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
558286 874892 2014 18 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Automatically annotating a five-billion-word corpus of Japanese blogs for sentiment and affect analysis
ترجمه فارسی عنوان
به صورت اتوماتیک یک وبلاگ پنج میلیارد کلمه ای را برای احساسات و تجزیه و تحلیل تاثیر می گذارد؟
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر پردازش سیگنال
چکیده انگلیسی


• We perform automatic annotation of a large blog corpus with affective information.
• A survey in emotion corpora shows there is no large emotion corpus for Japanese.
• The annotations contain emotion classes, emoticons, valence/activation, etc.
• The annotations are evaluated on a random 1000 sentence sample.
• The statistics of annotations are compared to other existing emotion corpora.

This paper presents our research on automatic annotation of a five-billion-word corpus of Japanese blogs with information on affect and sentiment. We first perform a study in emotion blog corpora to discover that there has been no large scale emotion corpus available for the Japanese language. We choose the largest blog corpus for the language and annotate it with the use of two systems for affect analysis: ML-Ask for word- and sentence-level affect analysis and CAO for detailed analysis of emoticons. The annotated information includes affective features like sentence subjectivity (emotive/non-emotive) or emotion classes (joy, sadness, etc.), useful in affect analysis. The annotations are also generalized on a two-dimensional model of affect to obtain information on sentence valence (positive/negative), useful in sentiment analysis. The annotations are evaluated in several ways. Firstly, on a test set of a thousand sentences extracted randomly and evaluated by over forty respondents. Secondly, the statistics of annotations are compared to other existing emotion blog corpora. Finally, the corpus is applied in several tasks, such as generation of emotion object ontology or retrieval of emotional and moral consequences of actions.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Speech & Language - Volume 28, Issue 1, January 2014, Pages 38–55
نویسندگان
, , , ,