کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
379164 | 659271 | 2009 | 18 صفحه PDF | دانلود رایگان |

A good clustering method should provide flexible scalability on the number of dimensions as well as the size of a data set. This paper proposes a method of efficiently tracing the clusters of a high-dimensional on-line data stream. While tracing the one-dimensional clusters of each dimension independently, a technique which is similar to frequent itemset mining is employed to find the set of multi-dimensional clusters. By finding a frequently co-occurred set of one-dimensional clusters, it is possible to trace a multi-dimensional rectangular space whose range is defined by the one-dimensional clusters collectively. In order to trace such candidates over a multi-dimensional online data stream, a cluster-statistics tree (CS-Tree) is proposed in this paper. A k-depth node(k ⩽ d) in the CS-tree is corresponding to a k-dimensional rectangular space. Each node keeps track of the density of data elements in its corresponding rectangular space. Only a node corresponding to a dense rectangular space is allowed to have a child node. The scalability on the number of dimensions is greatly enhanced while sacrificing the accuracy of identified clusters slightly.
Journal: Data & Knowledge Engineering - Volume 68, Issue 3, March 2009, Pages 362–379