Article ID Journal Published Year Pages File Type
487475 Procedia Computer Science 2015 10 Pages PDF
Abstract

In this paper, DPMine, a new approach for discovering large Colossal Pattern Sequences from Biological datasets is discussed. DPMine effectively discovers Doubleton Patterns which are further enriched into DPT+ tree to generate colossal pattern sequenceswith vector intersection operator. DPMine makes use of a new integrated data structure called ‘D-struct’, as combination of a doubleton data matrix and one dimensional array pair set to dynamically discover Doubleton Patterns from Biological datasets.DPT+ tree is constructed as Bitwise Top down Column enumeration tree. D-struct has a diverse feature to facilitate is, it hasextremely limited and accurately predictable main memory and runs very quickly in memory based constraints. The algorithm is designed in such a way that it takes only one scan over the database to discover large colossal pattern sequences. The empirical analysis on DPMine shows that, the proposed approach attains a better mining efficiency on various Biological datasets and outperforms Colossal Pattern Miner (CPM) and BVBUC in different settings. The performance of DPMine on Biological data set is also assessed with Accuracy and F-measure.

Related Topics
Physical Sciences and Engineering Computer Science Computer Science (General)