کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
551043 1450775 2015 13 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Approach for estimating similarity between procedures in differently compiled binaries
ترجمه فارسی عنوان
روش برای برآورد شباهت بین روش ها در دوتایی های متفاوتی با کامپایل مجدد
کلمات کلیدی
کلون نرم افزار، تشخیص کلون، کلون معنایی، متریک نرم افزار، تجزیه و تحلیل کد دودویی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر تعامل انسان و کامپیوتر
چکیده انگلیسی


• New software metrics based approach for detecting software clones in binary codes.
• Use of new metrics immune to syntactical changes introduced by different compilers.
• Evaluation of different ways of combining extracted metrics.
• Knowledge about used compiler brings up to 2.28 times better success rate.

ContextDetection of an unauthorized use of a software library is a clone detection problem that in case of commercial products has additional complexity due to the fact that only binary code is available.ObjectiveThe goal of this paper is to propose an approach for estimating the level of similarity between the procedures originating from different binary codes. The assumption is that the clones in the binary codes come from the use of a common software library that may be compiled with different toolsets.MethodThe approach uses a set of software metrics adapted from the high level languages and it also extends the set with new metrics that take into account syntactical changes that are introduced by the usage of different toolsets and optimizations. Moreover, the approach compares metric values and introduces transformers and formulas that can use training data for production of measure of similarities between the two procedures in binary codes. The approach has been evaluated on programs from STAMP benchmark and BusyBox tool, compiled with different toolsets in different modes.ResultsThe experiments with programs from STAMP benchmark show that detecting the same procedures recall can be up to 1.44 times higher using new metrics. Knowledge about the used compiling toolset can bring up to 2.28 times improvement in recall. The experiment with BusyBox tool shows 43% recall for 43% precision.ConclusionThe most useful newly proposed metrics are those that consider the frequency of arithmetic instructions, the number and frequency of occurrences for instructions, and the number of occurrences for target addresses in calls. The best way to combine the results of comparing metrics is to use a geometric mean or when previous knowledge is available, to use an arithmetic mean with appropriate transformer.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information and Software Technology - Volume 58, February 2015, Pages 259–271
نویسندگان
, , ,