Density-based data partitioning strategy to approximate large-scale subgraph mining

Article ID	Journal	Published Year	Pages	File Type
396501	Information Systems	2015	11 Pages	PDF

Abstract

Recently, graph mining approaches have become very popular, especially in certain domains such as bioinformatics, chemoinformatics and social networks. One of the most challenging tasks is frequent subgraph discovery. This task has been highly motivated by the tremendously increasing size of existing graph databases. Due to this fact, there is an urgent need of efficient and scaling approaches for frequent subgraph discovery. In this paper, we propose a novel approach for large-scale subgraph mining by means of a density-based partitioning technique, using the MapReduce framework. Our partitioning aims to balance computational load on a collection of machines. We experimentally show that our approach decreases significantly the execution time and scales the subgraph discovery process to large graph databases.

Keywords

Frequent Subgraph Mining Graph density Graph partitioning Cloud computing MapReduce