A fast clustering algorithm based on pruning unnecessary distance computations in DBSCAN for high-dimensional data

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
6938705	1449964	2018	42 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

00-01 99-00 DBSCAN

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر چشم انداز کامپیوتر و تشخیص الگو

پیش نمایش صفحه اول مقاله

A fast clustering algorithm based on pruning unnecessary distance computations in DBSCAN for high-dimensional data

چکیده انگلیسی

Clustering is an important technique to deal with large scale data which are explosively created in internet. Most data are high-dimensional with a lot of noise, which brings great challenges to retrieval, classification and understanding. No current existing approach is “optimal” for large scale data. For example, DBSCAN requires O(n2) time, Fast-DBSCAN only works well in 2 dimensions, and Ï-Approximate DBSCAN runs in O(n) expected time which needs dimension D to be a relative small constant for the linear running time to hold. However, we prove theoretically and experimentally that Ï-Approximate DBSCAN degenerates to an O(n2) algorithm in very high dimension such that 2Dâ¯>â¯â¯>â¯n. In this paper, we propose a novel local neighborhood searching technique, and apply it to improve DBSCAN, named as NQ-DBSCAN, such that a large number of unnecessary distance computations can be effectively reduced. Theoretical analysis and experimental results show that NQ-DBSCAN averagely runs in O(n*log(n)) with the help of indexing technique, and the best case is O(n) if proper parameters are used, which makes it suitable for many realtime data.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Pattern Recognition - Volume 83, November 2018, Pages 375-387

نویسندگان

Yewang Chen, Shengyu Tang, Nizar Bouguila, Cheng Wang, Jixiang Du, HaiLin Li,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

A fast clustering algorithm based on pruning unnecessary distance computations in DBSCAN for high-dimensional data

دسترسی سریع

ارتباط

English Website