Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
458409 | Journal of Systems and Software | 2015 | 23 Pages |
•D-P2P-Sim+ supports simulation with multi million nodes storing large amount of data.•It is an ideal platform for processing the collected data in a fully customized way.•It includes a robust framework to work easily on failure and recovery scenarios.•All operations as well as robust statistics are available through a simple java GUI.
In recent technologies like IoT (Internet of Things) and Web 2.0, a critical problem arises with respect to storing and processing the large amount of collected data. In this paper we develop and evaluate distributed infrastructures for storing and processing large amount of such data. We present a distributed framework that supports customized deployment of a variety of indexing engines over million-node overlays. The proposed framework provides the appropriate integrated set of tools that allows applications processing large amount of data, to evaluate and test the performance of various application protocols for very large scale deployments (multi million nodes–billions of keys). The key aim is to provide the appropriate environment that contributes in taking decisions regarding the choice of the protocol in storage P2P systems for a variety of big data applications. Using lightweight and efficient collection mechanisms, our system enables real-time registration of multiple measures, integrating support for real-life parameters such as node failure models and recovery strategies. Experiments have been performed at the PlanetLab network and at a typical research laboratory in order to verify scalability and show maximum re-usability of our setup. D-P2P-Sim+ framework is publicly available at http://code.google.com/p/d-p2p-sim/downloads/list.