کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
415116 681176 2014 15 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A Near-Linear Time Subspace Search Scheme for Unsupervised Selection of Correlated Features
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
پیش نمایش صفحه اول مقاله
A Near-Linear Time Subspace Search Scheme for Unsupervised Selection of Correlated Features
چکیده انگلیسی

In many real-world applications, data is collected in high dimensional spaces. However, not all dimensions are relevant for data analysis. Instead, interesting knowledge is hidden in correlated subsets of dimensions (i.e., subspaces of the original space). Detecting these correlated subspaces independently of the underlying mining task is an open research problem. It is challenging due to the exponential search space. Existing methods have tried to tackle this by utilizing Apriori search schemes. However, their worst case complexity is exponential in the number of dimensions; and even in practice they show poor scalability while missing high quality subspaces.This paper features a scalable subspace search scheme (4S), which overcomes the efficiency problem by departing from the traditional levelwise search. We propose a new generalized notion of correlated subspaces which gives way to transforming the search space to a correlation graph of dimensions. We perform a direct mining of correlated subspaces in this graph, and then, merge subspaces based on the MDL principle in order to obtain high dimensional subspaces with minimal redundancy. We theoretically show that our search scheme is more general than existing search schemes. Our empirical results reveal that 4S in practice scales near-linearly with both database size and dimensionality, and produces higher quality subspaces than state-of-the-art methods.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Big Data Research - Volume 1, August 2014, Pages 37–51
نویسندگان
, , ,