Article ID Journal Published Year Pages File Type
4954566 Computer Networks 2017 15 Pages PDF
Abstract
Protocol traffic analysis is a fundamental problem regarding a variety of networking and security applications, such as intrusion detection and prevention systems, network management systems, and protocol specification parsers. In this paper, we propose ProHacker, a nonparametric approach that extracts robust and accurate protocol keywords from the byte sequences generated by an application protocol, and effectively identifies the protocol trace from mixed Internet traffic. ProHacker is based on the key insight that the n-grams of protocol traces have highly predictable statistical nature that can be effectively captured by statistical language models and be leveraged for robust and accurate protocol identification. In ProHacker, we first extract protocol keywords using a nonparametric Bayesian statistical model, and then use the corresponding protocol keywords to classify protocol traces by a semi-supervised learning algorithm. We implement and evaluate ProHacker on real-world traces, and our experimental results show that ProHacker can accurately identify the protocol trace with an average precision of about 99.4% and an average recall of about 99.28%. We compare the results of ProHacker to one state-of-the-art approach, ProWord, and one our previous work, Securitas, using backbone traffic. We note that ProHacker provides significant improvements on precision and recall for online protocol identification.
Related Topics
Physical Sciences and Engineering Computer Science Computer Networks and Communications
Authors
, , , , ,