Article ID Journal Published Year Pages File Type
457341 Journal of Network and Computer Applications 2014 12 Pages PDF
Abstract

Tracing and monitoring tools, and other similar analysis tools, add new requirements to the old problem of coping with asynchronous clocks in distributed systems. Existing approaches based on the convex hull can achieve excellent accuracy for a posteriori analysis, but impose a significant cost and latency when used in live mode and over large clusters. We propose a novel method, LIANA (Live Incremental Asynchronous Network Analysis), for incrementally computing the clock offset, and updating it as the network evolves, along each communication link, as well as selecting the best synchronization paths and time reference node. Each connection in a network requires message exchanges to compute the clock skew and offset between two connected nodes. This method relies on the trace events recorded for the existing TCP/IP traffic between nodes. After computing the offset and its accuracy for every connection in the network graph, a minimum spanning tree is computed. The edges with the best accuracy are selected and form the spanning tree. Then, a central node is selected as the time reference to optimally compute the offset from any node to this reference node. LIANA is efficient, both in terms of synchronization accuracy and time complexity. The method, which is used for online distributed trace synchronization, has been evaluated in realistic scenarios with a diverse set of network topologies and traffic. We show that LIANA generates precise results highly efficiently, which makes it suitable for large cloud-distributed systems.

Related Topics
Physical Sciences and Engineering Computer Science Computer Networks and Communications
Authors
, ,