کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
6858628 1438290 2017 53 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Distributed clustering of categorical data using the information bottleneck framework
ترجمه فارسی عنوان
خوشه بندی توزیع داده های طبقه بندی شده با استفاده از چارچوب تنگنا اطلاعات
کلمات کلیدی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی
We perform clustering of categorical data using the Information Bottleneck, (IB), framework at large scale. We examine the performance of existing solutions using multiple machine architectures. The IB method uses information theory to recast database relations as probability distributions and the proximity of their tuples as their loss of information when they are considered together. More precisely, we study the Agglomerative Information Bottleneck, the Sequential Information Bottleneck and LIMBO, a newer approach that uses summaries of the original data. First we evaluate the performance and limitations of these algorithms when confronted with large datasets in a single, powerful machine. We then propose new implementations that take advantage of distributed environments. Using real and large synthetic datasets of tens of Gigabytes in size, we finally evaluate their effectiveness and efficiency.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Systems - Volume 72, December 2017, Pages 161-178
نویسندگان
, ,