کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
380858 1437455 2013 8 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
An application for plagiarized source code detection based on a parse tree kernel
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
An application for plagiarized source code detection based on a parse tree kernel
چکیده انگلیسی


• Program plagiarism detection method that relies on parse tree similarities.
• Parse trees are compared in a kernel space.
• A new source code parse tree kernel is proposed for detection performance.
• Evaluation with real-world data showed 0.93 F-1 score at max.

Program plagiarism detection is a task of detecting plagiarized code pairs among a set of source codes. In this paper, we propose a code plagiarism detection system that uses a parse tree kernel. Our parse tree kernel calculates a similarity value between two source codes in terms of their parse tree similarity. Since parse trees contain the essential syntactic structure of source codes, the system effectively handles structural information. The contributions of this paper are two-fold. First, we propose a parse tree kernel that is optimized for program source code. The evaluation shows that our system based on this kernel outperforms well-known baseline systems. Second, we collected a large number of real-world Java source codes from a university programming class. This test set was manually analyzed and tagged by two independent human annotators to mark plagiarized codes. It can be used to evaluate the performance of various detection systems in real-world environments. The experiments with the test set show that the performance of our plagiarism detection system reaches to 93% level of human annotators.

Figure optionsDownload as PowerPoint slide

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Engineering Applications of Artificial Intelligence - Volume 26, Issue 8, September 2013, Pages 1911–1918
نویسندگان
, , , ,