Article ID Journal Published Year Pages File Type
6856752 Information Sciences 2018 23 Pages PDF
Abstract
In recent years, the number and variety of malicious mobile apps have increased drastically, especially on Android platform, which brings insurmountable challenges for malicious app detection. Researchers endeavor to discover the traces of malicious apps using network traffic analysis. In this study, we combine network traffic analysis with machine learning methods to identify malicious network behavior, and eventually to detect malicious apps. However, most network traffic generated by malicious apps is benign, while only a small portion of traffic is malicious, leading to an imbalanced data problem when the traffic model skews towards modeling the benign traffic. To address this problem, we introduce imbalanced classification methods, including the synthetic minority oversampling technique (SMOTE) + support vector machine (SVM), SVM cost-sensitive (SVMCS), and C4.5 cost-sensitive (C4.5CS) methods. However, when the imbalance rate reaches a certain threshold, the performance of common imbalanced classification algorithms degrades significantly. To avoid performance degradation, we propose to use the imbalanced data gravitation-based classification (IDGC) algorithm to classify imbalanced data. Moreover, we develop a simplex imbalanced data gravitation classification (S-IDGC) model to further reduce the time costs of IDGC without sacrificing the classification performance. In addition, we propose a machine learning based comparative benchmark prototype system, which provides users with substantial autonomy, such as multiple choices of the desired classifiers or traffic features. Using this prototype system, users can compare the detection performance of different classification algorithms on the same data set, as well as the performance of a specific classification algorithm on multiple data sets.
Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, , , , , , ,