Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
495860 | Applied Soft Computing | 2014 | 19 Pages |
•Two online soft subspace clustering are proposed by adopting online learning strategy.•We extend online subspace clustering and present two streaming soft subspace clustering.•Our methods receive the benefit of online learning scheme and scalable clustering strategy.•Online soft subspace clustering achieves better results than the batch versions.•Streaming soft subspace clustering obtains results as good as online and batch ones.
A key challenge to most conventional clustering algorithms in handling many real world problems is that, data points in different clusters are often correlated with different subsets of features. To address this problem, subspace clustering has attracted increasing attention in recent years. In practical data mining applications, data points may arrive in continuous streams with chunks of samples being collected at different time points. In addition, huge amounts of data often cannot be kept in the main memory due to memory restriction. Accordingly, a range of evolving clustering algorithms has been proposed, however, traditional evolving clustering methods cannot be effectively applied to large-scale high dimensional data and data streams. In this study, we extend the online learning strategy and scalable clustering technique to soft subspace clustering to form evolving soft subspace clustering. We propose two online soft subspace clustering algorithms, OFWSC and OEWSC, and two streaming soft subspace clustering algorithms, SSSC_F and SSSC_E. The proposed evolving soft subspace clustering leverages on the effectiveness of online learning scheme and scalable clustering methods for streaming data by revealing the important local subspace characteristics of high dimensional data. Substantial experimental results on both artificial and real-world datasets demonstrate that our proposed methods are generally effective in evolving clustering and achieve superior performance over existing soft subspace clustering techniques.
Graphical abstractEvolving soft subspace clustering can achieve good clustering performances for high dimensional text data streams. The important relevant words (dimensions) in each cluster are reported below, which illustrates the feature weightings of each dimension for text dataset A2 detected by SSSC_E. In the graphs, the x-axis represents the feature (word) index, and the y-axis represents the feature weighting. From the feature weightings, we can easily identify the important relevant dimensions of each cluster, which represent the keywords of the corresponding topic, i.e., jpeg, pixel, gif, displai for category comp.graphics and bibl, theist, jesus, faith for category alt.atheism. The keywords with high weightings can be used to understand the topics in document clustering analysis.Figure optionsDownload full-size imageDownload as PowerPoint slide