Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
4954566 | Computer Networks | 2017 | 15 Pages |
Abstract
Protocol traffic analysis is a fundamental problem regarding a variety of networking and security applications, such as intrusion detection and prevention systems, network management systems, and protocol specification parsers. In this paper, we propose ProHacker, a nonparametric approach that extracts robust and accurate protocol keywords from the byte sequences generated by an application protocol, and effectively identifies the protocol trace from mixed Internet traffic. ProHacker is based on the key insight that the n-grams of protocol traces have highly predictable statistical nature that can be effectively captured by statistical language models and be leveraged for robust and accurate protocol identification. In ProHacker, we first extract protocol keywords using a nonparametric Bayesian statistical model, and then use the corresponding protocol keywords to classify protocol traces by a semi-supervised learning algorithm. We implement and evaluate ProHacker on real-world traces, and our experimental results show that ProHacker can accurately identify the protocol trace with an average precision of about 99.4% and an average recall of about 99.28%. We compare the results of ProHacker to one state-of-the-art approach, ProWord, and one our previous work, Securitas, using backbone traffic. We note that ProHacker provides significant improvements on precision and recall for online protocol identification.
Related Topics
Physical Sciences and Engineering
Computer Science
Computer Networks and Communications
Authors
YiPeng Wang, Xiaochun Yun, Yongzheng Zhang, Liwei Chen, Tianning Zang,