کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
403452 | 677231 | 2016 | 18 صفحه PDF | دانلود رایگان |
We present a streaming data clustering algorithm which allows removals of data records at any arbitrary time. Unlike other existing algorithms whose objective is to track evolving clusters by letting the weight of old data decline with time, we consider the case when all data records do not have pre-specified lifetime which is the characteristic of many real data sets such as bank accounts. A data record can be added or removed by users at any arbitrary time. The algorithm processes each datum in one-pass-throw-away fashion without storing the whole data set. A technique for merging several micro-clusters into a hyper-cylindrical micro-cluster is proposed to reduce the number of micro-clusters in feature space, and thus reduce computation. The performance of this algorithm is tested with several data sets including both synthetic and real data sets. The proposed algorithm shows better performances compared with other state-of-the-art algorithms in terms of several indices for measuring clustering performance.
Journal: Knowledge-Based Systems - Volume 99, 1 May 2016, Pages 183–200