کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
535122 870323 2009 8 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A lower bound on the sample size needed to perform a significant frequent pattern mining task
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر چشم انداز کامپیوتر و تشخیص الگو
پیش نمایش صفحه اول مقاله
A lower bound on the sample size needed to perform a significant frequent pattern mining task
چکیده انگلیسی

During the past few years, the problem of assessing the statistical significance of frequent patterns extracted from a given set S of data has received much attention. Considering that S always consists of a sample drawn from an unknown underlying distribution, two types of risks can arise during a frequent pattern mining process: accepting a false frequent pattern or rejecting a true one. In this context, many approaches presented in the literature assume that the dataset size is an application-dependent parameter. In this case, there is a trade-off between both errors leading to solutions that only control one risk to the detriment of the other one. On the other hand, many sampling-based methods have attempted to determine the optimal size of S ensuring a good approximation of the original (potentially infinite) database from which S is drawn. However, these approaches often resort to Chernoff bounds that do not allow the independent control of the two risks. In this paper, we overcome the mentioned drawbacks by providing a lower bound on the sample size required to control both risks and achieve a significant frequent pattern mining task.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Pattern Recognition Letters - Volume 30, Issue 11, 1 August 2009, Pages 960–967
نویسندگان
, , ,