کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
384690 660853 2013 10 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Determining and characterizing the reused text for plagiarism detection
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
Determining and characterizing the reused text for plagiarism detection
چکیده انگلیسی

An important task in plagiarism detection is determining and measuring similar text portions between a given pair of documents. One of the main difficulties of this task resides on the fact that reused text is commonly modified with the aim of covering or camouflaging the plagiarism. Another difficulty is that not all similar text fragments are examples of plagiarism, since thematic coincidences also tend to produce portions of similar text. In order to tackle these problems, we propose a novel method for detecting likely portions of reused text. This method is able to detect common actions performed by plagiarists such as word deletion, insertion and transposition, allowing to obtain plausible portions of reused text. We also propose representing the identified reused text by means of a set of features that denote its degree of plagiarism, relevance and fragmentation. This new representation aims to facilitate the recognition of plagiarism by considering diverse characteristics of the reused text during the classification phase. Experimental results employing a supervised classification strategy showed that the proposed method is able to outperform traditionally used approaches.


► We have proposed a new method for detecting document plagiarism.
► It allows the identification of similar reused word strings between two documents.
► We are able to discover text that has suffered from some modifications.
► We proposed a richer representation of the portions of reused text.
► We proposed a simple methodology that automatically select the best configuration.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Expert Systems with Applications - Volume 40, Issue 5, April 2013, Pages 1804–1813
نویسندگان
, , , , ,