کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
431465 | 688555 | 2014 | 12 صفحه PDF | دانلود رایگان |
• A framework for processing streaming data in parallel in a distributed-memory manner is described.
• The framework performs message-passing to achieve parallelism, using either an MPI or sockets (ZMQ) backend.
• Performance of MPI versus sockets (ZMQ) is benchmarked for communicating large numbers of small messages.
• Three algorithms are presented for processing streaming graph edges to deduce structure and patterns in the graphs.
The need to process streaming data, which arrives continuously at high-volume in real-time, arises in a variety of contexts including data produced by experiments, collections of environmental or network sensors, and running simulations. Streaming data can also be formulated as queries or transactions which operate on a large dynamic data store, e.g. a distributed database.We describe a lightweight, portable framework named PHISH which provides a communication model enabling a set of independent processes to compute on a stream of data in a distributed-memory parallel manner. Datums are routed between processes in patterns defined by the application. PHISH provides multiple communication backends including MPI and sockets/ZMQ. The former means streaming computations can be run on any parallel machine which supports MPI; the latter allows them to run on a heterogeneous, geographically dispersed network of machines.We illustrate how streaming MapReduce operations can be implemented using the PHISH communication model, and describe streaming versions of three algorithms for large, sparse graph analytics: triangle enumeration, sub-graph isomorphism matching, and connected component finding. We also provide benchmark timings comparing MPI and socket performance for several kernel operations useful in streaming algorithms.
Journal: Journal of Parallel and Distributed Computing - Volume 74, Issue 8, August 2014, Pages 2687–2698