کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
402848 677021 2013 11 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Finding multiple global linear correlations in sparse and noisy data sets
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
Finding multiple global linear correlations in sparse and noisy data sets
چکیده انگلیسی

Finding linear correlations is an important research problem with numerous real-world applications. In real-world data sets, linear correlation may not exist in the entire data set. Some linear correlations are only visible in certain data subsets. On one hand, a lot of local correlation clustering algorithms assume that the data points of a linear correlation are locally dense. These methods may miss some global correlations when data points are sparsely distributed. On the other hand, existing global correlation clustering methods may fail when the data set contains a large amount of non-correlated points or the actual correlations are coarse. This paper proposes a simple and fast algorithm DCSearch for finding multiple global linear correlations in a data set. This algorithm is able to find the coarse and global linear correlation in noisy and sparse data sets. By using the classical divide and conquer strategy, it first divides the data set into subsets to reduce the search space, and then recursively searches and prunes the candidate correlations from the subsets. Empirical studies show that DCSearch can efficiently reduce the number of candidate correlations during each iteration. Experimental results on both synthetic and real data sets demonstrate that DCSearch is effective and efficient in finding global linear correlations in sparse and noisy data sets.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Knowledge-Based Systems - Volume 53, November 2013, Pages 40–50
نویسندگان
, , ,