Article ID Journal Published Year Pages File Type
4966481 Information Processing & Management 2016 14 Pages PDF
Abstract
The rapid growth of documents in different languages, the increased accessibility of electronic documents, and the availability of translation tools have caused cross-lingual plagiarism detection research area to receive increasing attention in recent years. The task of cross-language plagiarism detection entails two main steps: candidate retrieval and assessing pairwise document similarity. In this paper we examine candidate retrieval, where the goal is to find potential source documents of a suspicious text. Our proposed method for cross-language plagiarism detection is a keyword-focused approach. Since plagiarism usually happens in parts of the text, there is a requirement to segment the texts into fragments to detect local similarity. Therefore we propose a topic-based segmentation algorithm to convert the suspicious document to a set of related passages. After that, we use a proximity-based model to retrieve documents with the best matching passages. Experiments show promising results for this important phase of cross-language plagiarism detection.
Related Topics
Physical Sciences and Engineering Computer Science Computer Science Applications
Authors
, ,