Article ID Journal Published Year Pages File Type
379148 Data & Knowledge Engineering 2009 17 Pages PDF
Abstract

Data Mining techniques are commonly used to extract patterns, like association rules and decision trees, from huge volumes of data. The comparison of patterns is a fundamental issue, which can be exploited, among others, to synthetically measure dissimilarities in evolving or different datasets and to compare the output produced by different data mining algorithms on a same dataset. In this paper, we present the Panda framework for computing the dissimilarity of both simple and complex patterns, defined upon raw data and other patterns, respectively. In Panda the problem of comparing complex patterns is decomposed into simpler sub-problems on the component (simple or complex) patterns and so-obtained partial solutions are then smartly aggregated into an overall dissimilarity score. This intrinsically recursive approach grants Panda with a high flexibility and allows it to easily handle patterns with highly complex structures. Panda is built upon a few basic concepts so as to be generic and clear to the end user. We demonstrate the generality and flexibility of Panda by showing how it can be easily applied to a variety of pattern types, including sets of itemsets and clusterings.

Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, , , , ,