Article ID Journal Published Year Pages File Type
6874502 Journal of Computational Science 2017 11 Pages PDF
Abstract
The paper concerns the issue of modeling and generating a representative Web workload for Web server performance evaluation through simulation experiments. Web traffic analysis has been done from two decades, usually based on Web server log data. However, while the character of the overall Web traffic has been extensively studied and modeled, relatively few studies have been devoted to the analysis of Web traffic generated by Internet robots (Web bots). Moreover, the overwhelming majority of studies concern the traffic on non e-commerce websites. In this paper we address the problem of modeling a realistic arrival process of bots' requests on an e-commerce Web server. Based on real log data for an online store, sessions generated by bots were reconstructed and their key features were analyzed, including the interarrival time of bot sessions, the number of HTTP requests per session, and the interarrival time of requests in session. To deal with the problem of non-stationarity of the Web traffic, chunks associated with times of day were distinguished based on the intensity of bot sessions' arrivals and then features of sessions in individual time chunks were analyzed separately. Using regression analysis, a mathematical model of the bots' traffic features was developed and implemented in a bot traffic generator. Our findings confirm the existence of a heavy-tail in bot traffic features' distributions. The bots' session interarrival times and request interarrival times are best modeled by a Weibull and a sigmoid distributions, respectively, while the model proposed for the numbers of requests per bot session is based on a hybrid function being a combination of one exponential and two normal distribution functions. The suitable fit of the model was confirmed by the high correlation of the real and model data. Furthermore, a visual inspection of the simulation results showed that the estimated values represent distributions close to those of the empirical data.
Related Topics
Physical Sciences and Engineering Computer Science Computational Theory and Mathematics
Authors
, ,