Article ID Journal Published Year Pages File Type
386246 Expert Systems with Applications 2014 12 Pages PDF
Abstract

•We propose a graph-structural index called the RG-index for efficient SPARQL query processing.•We propose an efficient building algorithm for the RG-index adapted from the gSpan algorithm.•We present the RFLT operator, which conducts triple filtering efficiently.•We develop the cost model and the cardinality estimation method for the RFLT operator.•We conducted comprehensive experiments with very large-scale real-life and synthetic RDF datasets.

As the size of Resource Description Framework (RDF) graphs has grown rapidly, SPARQL query processing on the large-scale RDF graph has become a more challenging problem. For efficient SPARQL query processing, the handling of the intermediate results is the most crucial element because it generally involves many join operators. Recently, a triple filtering method, called the RP-filter, which uses a path-based index, was proposed. It can reduce the intermediate results effectively by filtering out irrelevant triples in advance. However, its filtering power is limited, because it uses only the path information of the RDF graph. In this paper, we extend the triple filtering method to exploit the graph-structural information, and propose the RDF graph index (RG-index). We address the problem of the RG-index, which is caused by the indexing of the graph patterns, by indexing only effective graph patterns for the triple filtering. In addition, we propose an efficient method for building the RG-index in which a frequent graph pattern mining algorithm is adapted. We conducted comprehensive experiments on large-scale RDF datasets and demonstrated that the RG-index can reduce redundant intermediate results more effectively than can the RP-filter.

Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, , ,