کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
433355 1441667 2014 19 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
SeByte: Scalable clone and similarity search for bytecode
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
پیش نمایش صفحه اول مقاله
SeByte: Scalable clone and similarity search for bytecode
چکیده انگلیسی

While source code clone detection is a well-established research area, finding similar code fragments in binary and other intermediate code representations has been not yet that widely studied. In this paper, we introduce SeByte, a bytecode clone detection and search model that applies semantic-enabled token matching. It is developed based on the idea of relaxation on the code fingerprints. This approach separates the input content based on the types of tokens into different dimensions, with each dimension representing the input content from a specific point of view. Following this approach, SeByte compares each dimension separately and independently which we refer to as multi-dimensional comparison in our research. As the similarity search function we use a well-known measure that supports our multi-dimensional comparison heuristic, the Jaccard similarity coefficient. Our preliminary study shows that SeByte can detect clones that are missed by existing approaches due to the differences in the input data and the search algorithm. We then further exploit the model to build a scalable bytecode clone search engine. This extension meets the requirements of a classical search engine including the ranking of result sets. Our evaluation with a large dataset of 500,000 compiled Java classes, which we extracted from the six most recent versions of the Eclipse IDE, showed that our SeByte search is not only scalable but also capable of providing a reliable ranking.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Science of Computer Programming - Volume 95, Part 4, 1 December 2014, Pages 426–444
نویسندگان
, , ,