کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
392526 664776 2016 28 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Answering skyline queries on probabilistic data using the dominance of probabilistic skyline tuples
ترجمه فارسی عنوان
پاسخ دادن به خطوط افقی در داده های احتمالاتی با استفاده از تسلط طوفان افقی احتمالاتی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی

Although skyline queries are very useful in such areas such as decision support, market analysis and personalized services, they have not been extensively studied in the context of uncertain data. The existing work on answering probabilistic skyline queries either requires a user to define a threshold (Pei et al., 2007), or return all probabilistic skyline objects (Atallah and Qi, 2009). However, it is difficult to set the threshold because if set too high, important results may be lost, but if set too low or if there is no threshold, a lot of low quality results may be returned (Hua et al., 2011; Le and Cao, 2012; Le et al., 2013) [17]. In this paper, we identify two main challenges in answering probabilistic skyline queries. The first is defining what are the interesting probabilistic skyline tuples to return to the users. The second is efficiently finding these tuples without enumerating all possible worlds. We overcome the first challenge by introducing the bestpro-skyline query, which extends the dominance principle to also include the skyline probability of the probabilistic skyline tuples. This approach results in pruning the result set to just a very small number of the most interesting probabilistic skyline tuples without the need to set any user-defined threshold. We overcome the second challenge by using formulas based on the probabilistic theory to directly calculate the skyline probabilities without considering any possible worlds and develop algorithms to prune the search space. Experiments show that our solution is able to find the 17 interesting probabilistic skyline tuples from 13,095 tuples within 19 s in a real data set. Our solution outperforms a Naïve solution by up to three orders of magnitude for computational time.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Sciences - Volumes 340–341, 1 May 2016, Pages 58–85
نویسندگان
, , ,