کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
461133 | 696562 | 2013 | 15 صفحه PDF | دانلود رایگان |

Sequential pattern mining (SPM) is an important technique for determining time-related behavior in sequence databases. In real-life applications, the frequencies for various items in a sequence database are not exactly equal. If all items are set with the same minimum support, the rare item problem may result, meaning that we are unable to effectively retrieve interesting patterns regardless of whether minsup is set too high or too low. Liu (2006) first included the concept of multiple minimum supports (MMSs) to SPM. It allows users to specify the minimum item support (MIS) for each item according to its natural frequency. A generalized sequential pattern-based algorithm, named Multiple Supports – Generalized Sequential Pattern (MS-GSP), was also developed to mine complete set of sequential patterns. However, the MS-GSP adopts candidate generate-and-test approach, which has been recognized as a costly and time-consuming method in pattern discovery. For the efficient mining of sequential patterns with MMSs, this study first proposes a compact data structure, called a Preorder Linked Multiple Supports tree (PLMS-tree), to store and compress the entire sequence database. Based on a PLMS-tree, we develop an efficient algorithm, Multiple Supports – Conditional Pattern growth (MSCP-growth), to discover the complete set of patterns. The experimental result shows that the proposed approach achieves more preferable findings than the MS-GSP and the conventional SPM.
► Multiple minimum supports can prune the search space in sequential pattern mining (SPM).
► An efficient tree-based method is proposed for SPM with multiple minimum supports.
► The algorithm is evaluated by both synthetic and real-life datasets.
► Experimental results show our method is more efficient than traditional methods.
Journal: Journal of Systems and Software - Volume 86, Issue 5, May 2013, Pages 1224–1238