SeByte: Scalable clone and similarity search for bytecode

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
433355	1441667	2014	19 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Java bytecode Clone detection - تشخیص کلون Semantic search - جستجو معنایی Semantic web - وب معنایی

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات

پیش نمایش صفحه اول مقاله

SeByte: Scalable clone and similarity search for bytecode

چکیده انگلیسی

While source code clone detection is a well-established research area, finding similar code fragments in binary and other intermediate code representations has been not yet that widely studied. In this paper, we introduce SeByte, a bytecode clone detection and search model that applies semantic-enabled token matching. It is developed based on the idea of relaxation on the code fingerprints. This approach separates the input content based on the types of tokens into different dimensions, with each dimension representing the input content from a specific point of view. Following this approach, SeByte compares each dimension separately and independently which we refer to as multi-dimensional comparison in our research. As the similarity search function we use a well-known measure that supports our multi-dimensional comparison heuristic, the Jaccard similarity coefficient. Our preliminary study shows that SeByte can detect clones that are missed by existing approaches due to the differences in the input data and the search algorithm. We then further exploit the model to build a scalable bytecode clone search engine. This extension meets the requirements of a classical search engine including the ranking of result sets. Our evaluation with a large dataset of 500,000 compiled Java classes, which we extracted from the six most recent versions of the Eclipse IDE, showed that our SeByte search is not only scalable but also capable of providing a reliable ranking.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Science of Computer Programming - Volume 95, Part 4, 1 December 2014, Pages 426–444

نویسندگان

Iman Keivanloo, Chanchal K. Roy, Juergen Rilling,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

SeByte: Scalable clone and similarity search for bytecode

دسترسی سریع

ارتباط

English Website