Article ID Journal Published Year Pages File Type
4956504 Journal of Systems and Software 2017 42 Pages PDF
Abstract
Log files are generated in many different formats by a plethora of devices and software. The proper analysis of these files can lead to useful information about various aspects of each system. Cloud computing appears to be suitable for this type of analysis, as it is capable to manage the high production rate, the large size and the diversity of log files. In this paper we investigated log file analysis with the cloud computational frameworks Apache™Hadoop® and Apache Spark™. We developed realistic log file analysis applications in both frameworks and we performed SQL-type queries in real Apache Web Server log files. Various experiments were performed with different parameters in order to study and compare the performance of the two frameworks.
Related Topics
Physical Sciences and Engineering Computer Science Computer Networks and Communications
Authors
, ,