کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
386787 660891 2014 14 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Dendroid: A text mining approach to analyzing and classifying code structures in Android malware families
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
Dendroid: A text mining approach to analyzing and classifying code structures in Android malware families
چکیده انگلیسی


• We analyze the statistical features of the code structures of Android malware.
• We describe Dendroid, a text mining approach to classify and analyze Android malware.
• Dendograms derived from hierarchical clustering reveal evolutionary relationships.
• Experiments show that Dendroid is an accurate and scalable support tool for analysts.

The rapid proliferation of smartphones over the last few years has come hand in hand with and impressive growth in the number and sophistication of malicious apps targetting smartphone users. The availability of reuse-oriented development methodologies and automated malware production tools makes exceedingly easy to produce new specimens. As a result, market operators and malware analysts are increasingly overwhelmed by the amount of newly discovered samples that must be analyzed. This situation has stimulated research in intelligent instruments to automate parts of the malware analysis process. In this paper, we introduce Dendroid, a system based on text mining and information retrieval techniques for this task. Our approach is motivated by a statistical analysis of the code structures found in a dataset of Android OS malware families, which reveals some parallelisms with classical problems in those domains. We then adapt the standard Vector Space Model and reformulate the modelling process followed in text mining applications. This enables us to measure similarity between malware samples, which is then used to automatically classify them into families. We also investigate the application of hierarchical clustering over the feature vectors obtained for each malware family. The resulting dendograms resemble the so-called phylogenetic trees for biological species, allowing us to conjecture about evolutionary relationships among families. Our experimental results suggest that the approach is remarkably accurate and deals efficiently with large databases of malware instances.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Expert Systems with Applications - Volume 41, Issue 4, Part 1, March 2014, Pages 1104–1117
نویسندگان
, , , ,