کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
397370 | 671185 | 2014 | 10 صفحه PDF | دانلود رایگان |
• Disjunctive emerging patterns can be found via minimal transversals in hypergraphs.
• We propose a new algorithm suitable for parallel and distributed environments.
• Experiments show that our method is efficient in terms of memory usage and computing time.
• We identified the key-features of datasets that affect most of our method.
We investigate in this paper the problem of mining disjunctive emerging patterns in high-dimensional biomedical datasets. Disjunctive emerging patterns are sets of features that are very frequent among samples of a target class, cases in a case–control study, for example, and are very rare among all other samples. We, for the very first time, demonstrate that this problem can be solved using minimal transversals in a hypergraph. We propose a new divide-and-conquer algorithm that enables us to efficiently compute disjunctive emerging patterns in parallel and distributed environments. We conducted experiments using real-world microarray gene expression datasets to assess the performance of our approach. Our results show that our approach is more efficient than the state-of-the-art solution available in the literature. In this sense, we contribute to the area of bioinformatics and data mining by providing another useful alternative to identify patterns distinguishing samples with different class labels, such as those in case–control studies, for example.
Journal: Information Systems - Volume 40, March 2014, Pages 1–10