کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
379400 659299 2007 12 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
An empirical study on selective partitioning dimensions for partition-based similarity joins
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
An empirical study on selective partitioning dimensions for partition-based similarity joins
چکیده انگلیسی

Real-world application data are usually distributed sparsely and non-uniformly in the high dimensional space that is huge in size. Hence, selection of effective partitioning dimensions is crucial for partition-based similarity joins. In this paper, we present two data partitioning algorithms for evaluations. PerDimSelect selects some dimension axes from the original perpendicular dimension axes pool, and maps each data point into the reduced dimension space. DiaDimSelect creates one-dimensional axis by combining some of original perpendicular dimensions, and maps each data point into the newly-created dimension. In the experiments, several measures are used to compare the performances of the algorithms including CPU cost, total response time, number of created buckets. In conclusion, DiaDimSelect shows better performance than PerDimSelect, for it creates much less partition buckets with the increasing number of partitioning dimensions, which leads to keep the IO cost less expensive while decreasing CPU cost considerably.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Data & Knowledge Engineering - Volume 63, Issue 2, November 2007, Pages 336–347
نویسندگان
,