Article ID Journal Published Year Pages File Type
6858626 Information Systems 2017 19 Pages PDF
Abstract
Constant technological advances in electronic devices have led to the growth of elaborated data such as large texts, time series, georeferenced imagery, genetic sequences, photos, videos and several other types of complex data. Differently from scalar, traditional data types such as numbers and strings, complex data do not present the order relation property, which allows identifying whether an element precedes another according to some criterion. Therefore, these data are usually compared by the similarity degree among them. The Metric Access Methods (MAMs) are recognized as well-suited to perform similarity queries over such kind of data more efficiently than other access methods. MAMs can be considered dynamic or static depending on the pivot type used to construct them. Pivots are often employed to narrow the search for data. Global pivots can be employed to look into elements in the whole dataset, thus they have a high impact in the process of pruning irrelevant elements, since a single global pivot can be used to discard a large amount of irrelevant elements. Nevertheless, MAMs based on global pivots may have their dynamicity compromised by the fact that eventual pivot-related updates must be propagated through the entire structure. Local pivots, on the other hand, allow the maintenance to occur locally at the price of a lower pruning ability. In this paper, we propose novel techniques for improving the performance of dynamic MAMs without harming their dynamicity, once that several applications handle online complex data and, consequently, demand efficient dynamic indexes to be successful. Specifically, our main contributions are three techniques: (i) CLAP, which consists of employing local additional pivots to reduce distance calculations; (ii) ACIR, which is combined with CLAP and anticipates information from child nodes to reduce unnecessary disk accesses; and (iii) SCOOP, which is combined with CLAP as an extended version of ACIR, anticipating a larger amount of information from child nodes. The techniques have been applied to a dynamic MAM and evaluated over real datasets ranging from moderate to high dimensionality and cardinality. The experimental results show that our techniques were able to reduce query execution time in up to 63% for point queries and up to 53% for queries retrieving multiple elements.
Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, , ,