کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
6940518 1450014 2018 12 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
On the role of syntactic dependencies and discourse relations for author and gender identification
ترجمه فارسی عنوان
در نقش وابستگی های نحوی و روابط گفتمانی برای شناسایی مولف و جنسیت
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر چشم انداز کامپیوتر و تشخیص الگو
چکیده انگلیسی
Author and author gender identification are two major tasks in the context of profiling of authors of written material. Author identification (or, more precisely, “authorship attribution”) copes with the assignment of the author, who is to be chosen from a given list of author names, to a piece of written material. Gender identification deals with the prediction of the gender of the author (male vs. female). Both tasks are very relevant to a number of applications, including, e.g., plagiarism and deception detection, document authenticity verification, and blackmailing. State of the art in both fields tends to rely mainly upon lexical and token (sequence) distribution features. But this means to neglect numerous linguistic studies that clearly indicate the high relevance of “deep linguistic”, i.e., syntactic and discourse, features to the characterization of the style of an author or a group of authors. Our work on author and gender identification confirms this relevance. We show with two different genres, namely blog posts and literary writings, that the use of deep linguistic features is very effective. It leads to  > 78% (in the case of blog posts) and  > 91% (in the case of literary writings) of accuracy in author identification and  > 89% (blog posts) and  > 90% (literary writings) of accuracy in gender identification.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Pattern Recognition Letters - Volume 105, 1 April 2018, Pages 87-95
نویسندگان
, ,