دانلود رایگان مقاله: طبقه بندی اسناد رسمی با استفاده از ویژگی های زبانی مبتنی بر نحو

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
4943302	1437618	2017	25 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

Text plagiarism classification using syntax based linguistic features

ترجمه فارسی عنوان

طبقه بندی اسناد رسمی با استفاده از ویژگی های زبانی مبتنی بر نحو

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Syntactic features Chunks - تنه linguistic features - ویژگی های زبانی

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی

پیش نمایش مقاله

طبقه بندی اسناد رسمی با استفاده از ویژگی های زبانی مبتنی بر نحو

چکیده انگلیسی

The proposed work models document level text plagiarism detection as a binary classification problem, where the task is to distinguish a given suspicious-source document pair as plagiarized or non-plagiarized. The objective is to explore the potency of syntax based linguistic features extracted using shallow natural language processing techniques for plagiarism classification task. Shallow syntactic features, viz., part of speech tags and chunks are utilized after effective pre-processing and filtrations for pruning the irrelevant information. The work further proposes the modelling of this classification phase as an intermediate stage, which will be post candidate source retrieval and before exhaustive passage level detections. A two-phase feature selection approach is proposed, which improves the effectiveness of classification by selecting appropriate set of features as the input to machine learning based classifiers. The proposed approach is evaluated on smaller and larger test conditions using the corpus of plagiarized short answers (PSA) and plagiarism instances collected from PAN corpus respectively. Under both the test conditions, performances are evaluated using general as well as advanced classification metrics. Another main contribution of the current work is the analysis of dependencies and impact of the extracted features, upon the type and complexity of plagiarism imposed in the documents. The proposed results are compared with the two state-of-the-art approaches and they outperform the baseline approaches significantly. This in turn reflects the cogency of syntactic linguistic features in document level plagiarism classification, especially for the instances close to manual or real plagiarism scenarios.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Expert Systems with Applications - Volume 88, 1 December 2017, Pages 448-464

نویسندگان

Vani K, Deepa Gupta,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دانلود رایگان مقاله ISI : طبقه بندی اسناد رسمی با استفاده از ویژگی های زبانی مبتنی بر نحو

دسترسی سریع

ارتباط

English Website