Reporting l most influential objects in uncertain databases based on probabilistic reverse top-k queries

Article ID	Journal	Published Year	Pages	File Type
4944475	Information Sciences	2017	23 Pages	PDF

Abstract

Reverse top-k queries are proposed from the perspective of a product manufacturer, which are essential for manufacturers to assess the potential market. However, the existing approaches for reverse top-k queries are all based on the assumption that the underlying data are exact (or certain). Due to the intrinsic differences between uncertain and certain data, these methods cannot be applied to process uncertain data sets directly. Motivated by this, in this paper, we firstly model the probabilistic reverse top-k queries over uncertain data. Moreover, we formulate a probabilistic top-l influential query, that reports the l most influential objects having the largest impact factors, where the impact factor of an object is defined as the cardinality of its probabilistic reverse top-k query result set. We present effective pruning heuristics for speeding up the queries. Particularly, we exploit several properties of probabilistic threshold top-k queries and probabilistic skyline queries to reduce the search space of this problem. In addition, an upper bound of the potential users is estimated to reduce the cost of computing the probabilistic reverse top-k queries for the candidate objects. Finally, efficient query algorithms are presented seamlessly with integration of the proposed pruning strategies. Extensive experiments using both real-world and synthetic data sets demonstrate the efficiency and effectiveness of our proposed algorithms.

Keywords

Data management