Article ID Journal Published Year Pages File Type
11002412 Future Generation Computer Systems 2019 21 Pages PDF
Abstract
By using Apache Spark as illustration, we show how various data stream processing efficiency issues can be mitigated or optimized by our distributed tracing engine. We describe and qualitatively compare two different designs, one based on reporting to a distributed database and another based on trace piggybacking. Our prototype implementation consists of wrappers suitable for JVM environments in general, with minimal impact on the source code of the core system. Our tracing framework is the first to solve tracing in multiple systems across boundaries and to provide detailed performance measurements suitable for automated optimization, not just debugging.
Related Topics
Physical Sciences and Engineering Computer Science Computational Theory and Mathematics
Authors
, , , ,