Article ID Journal Published Year Pages File Type
455622 Computers & Electrical Engineering 2014 12 Pages PDF
Abstract

•Using Bloom filters to enhance the efficiency of the plagiarism detection system.•Enhancing the speed of plagiarism detection and reducing the memory required to store the documents.•Considering the privacy of content while running similarity estimation process.•Adjustable according to the requirements of the relevant application.

With the easy access to the huge volume of articles available on the Internet, plagiarism is getting worse and worse. Most recent approaches proposed to address this problem usually focus on achieving better accuracy of similarity detection process. However, there are some real applications where plagiarized contents should be detected without revealing any information. Moreover, in such web-based applications, running time, memory consumption, communication and computational complexity should be also taken into account. In this paper, we propose a similar document detection system based on matrix Bloom filter, a new extension of standard Bloom filter. The experimental results on a real dataset show that the system can achieve 98% of accuracy. We also compare our approach with a method recently proposed for the same purpose. The results of the comparison show that the Bloom filter-based approach achieves much better performance than other in terms of the aforementioned factors.

Related Topics
Physical Sciences and Engineering Computer Science Computer Networks and Communications
Authors
, ,