کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
402589 676968 2015 14 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Discriminative subprofile-specific representations for author profiling in social media
ترجمه فارسی عنوان
نمایندگی های اختصاصی زیرپروفی قابل تشخیص برای پروفایل نویسنده در رسانه های اجتماعی
کلمات کلیدی
پروفایل پروفیل، معدن وب طبقه بندی متن، رسانه های اجتماعی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی

The Author Profiling (AP) task aims to reveal as much as possible information from a given author’s document (e.g., age, gender, etc.). AP is crucial for several applications, ranging from customized advertising to computer forensics, psychology, and entertainment. Nonetheless, the AP task is far from being solved, particularly in social media domains, where the nature of documents hinder the applicability of state-of-the-art text mining tools (e.g., because of spelling-grammar errors, huge vocabularies, and the presence of many out-of-vocabulary terms). Currently, most of the work in AP for social media has been devoted to the development of descriptive features, which are used under standard representations, such as the Bag-of-Words (BoW). Nevertheless, BoW-like representations have some well known shortcomings, namely: (i) the sparsity and high dimensionality of the representation, and (ii) the failure to capture relationships, other than mere occurrence, among terms. This paper focuses on the study of alternative document representations that can deal with such issues. We propose a representation for documents that capture discriminative and subprofile-specific information of terms. Under the proposed representation, terms are represented in a vector space that captures discriminative information. Then, term representations are aggregated to represent the content of a document. In this manner, documents are represented in a low-dimensional (and discriminative) space which is non-sparse. We evaluate the effectiveness of the proposed representation on several corpora from the social media domain. The proposed representation is compared to the standard BoW representation and a wide variety of state-of-the-art AP approaches. Experimental results reveal that the proposed representation outperforms most of the reference methodologies. Furthermore, we show that the proposed representation is in agreement with previous studies on handcrafted attributes for AP.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Knowledge-Based Systems - Volume 89, November 2015, Pages 134–147
نویسندگان
, , , , ,