Article ID Journal Published Year Pages File Type
4956318 Journal of Systems and Software 2018 14 Pages PDF
Abstract

•An improved strategy for inserting data into (e)SHTs.•A new query mechanism to reduce IO.•Proof that typical queries outperform their predecessors.•Case study showing the benefits of this work on open-source software.

Understanding the behaviour of distributed computer systems with many threads and resources is a challenging task. Dynamic analysis tools such as tracers have been developed to assist programmers in debugging and optimizing the performance of such systems. However, complex systems can generate huge traces, with billions of events, which are hard to analyze manually. Trace visualization and analysis programs aim to solve this problem. Such software needs fast access to data, which a linear search through the trace cannot provide. Several programs have resorted to stateful analysis to rearrange data into more query friendly structures.In previous work, we suggested modifications to the State History Tree (SHT) data structure to correct its disk and memory usage. While the improved structure, eSHT, made near optimal disk usage and had reduced memory usage, we found that query performance, while twice as fast, exhibited scaling limitations.In this paper, we proposed a new structure using R-Tree techniques to improve query performance. We explain the hybrid scheme and algorithms used to optimize the structure to model the expected behaviour. Finally, we benchmark the data structure on highly parallel traces and on a demanding trace visualization use case.Our results show that the hybrid R-SHT structure retains the eSHT's optimal disk usage properties while providing several orders of magnitude speed up to queries on highly parallel traces.

Related Topics
Physical Sciences and Engineering Computer Science Computer Networks and Communications
Authors
, , ,