کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
415343 681201 2016 14 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
New upper bounds for tight and fast approximation of Fisher’s exact test in dependency rule mining
ترجمه فارسی عنوان
مرزهای بالایی جدید برای تقریب تنگ و سریع تست دقیق فیشر در معادله قانون وابستگی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
چکیده انگلیسی


• A family of new tight upper bounds to approximate Fisher’s pp is introduced.
• The new approximations suit for data mining purposes, because they are much faster to evaluate than the exact pp-value.
• Theoretical analysis and empirical evaluation show that the new approximations are very accurate for all practical purposes.
• The approximations are not sensitive to the data size, distribution or small expected counts.

In the dependency rule mining, the goal is to discover the most significant statistical dependencies among all possible collapsed 2×22×2 contingency tables. Fisher’s exact test is a robust method to estimate the significance and it enables efficient pruning of the search space. The problem is that evaluating the required pp-value can be very laborious and the worst case time complexity is O(n)O(n), where nn is the data size. The traditional solution is to approximate the significance with the χ2χ2-measure, which can be estimated in a constant time. However, the χ2χ2-measure can produce unreliable results (discover spurious dependencies but miss the most significant dependencies). Furthermore, it does not support efficient pruning of the search space. As a solution, a family of tight upper bounds for Fisher’s pp is introduced. The new upper bounds are fast to calculate and approximate Fisher’s pp-value accurately. In addition, the new approximations are not sensitive to the data size, distribution, or smallest expected counts like the χ2χ2-based approximation. In practice, the execution time depends on the desired accuracy level. According to experimental evaluation, the simplest upper bounds are already sufficiently accurate for dependency rule mining purposes and they can be estimated in 0.004–0.1% of the time needed for exact calculation. For other purposes (testing very weak dependencies), one may need more accurate approximations, but even they can be calculated in less than 1% of the exact calculation time.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computational Statistics & Data Analysis - Volume 93, January 2016, Pages 469–482
نویسندگان
,