دانلود رایگان مقاله: یک الگوریتم همگروهی مبتنی بر هش برای داده های طبقه بندی شده

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
6855451	660780	2016	12 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

A hash-based co-clustering algorithm for categorical data

ترجمه فارسی عنوان

یک الگوریتم همگروهی مبتنی بر هش برای داده های طبقه بندی شده

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Biclustering Categorical data - داده های طبقه بندی شده Data mining - داده‌کاوی Text mining - متن‌کاوی Co-clustering - گروه خوشه ای

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی

پیش نمایش مقاله

یک الگوریتم همگروهی مبتنی بر هش برای داده های طبقه بندی شده

چکیده انگلیسی

Cluster analysis, or clustering, refers to the analysis of the structural organization of a data set. This analysis is performed by grouping together objects of the data that are more similar among themselves than to objects of different groups. The sampled data may be described by numerical features or by a symbolic representation, known as categorical features. These features often require a transformation into numerical data in order to be properly handled by clustering algorithms. The transformation usually assigns a weight for each feature calculated by a measure of importance (i.e., frequency, mutual information). A problem with the weight assignment is that the values are calculated with respect to the whole set of objects and features. This may pose as a problem when a subset of the features have a higher degree of importance to a subset of objects but a lower degree with another subset. One way to deal with such problem is to measure the importance of each subset of features only with respect to a subset of objects. This is known as co-clustering that, similarly to clustering, is the task of finding a subset of objects and features that presents a higher similarity among themselves than to other subsets of objects and features. As one might notice, this task has a higher complexity than the traditional clustering and, if not properly dealt with, may present an scalability issue. In this paper we propose a novel co-clustering technique, called HBLCoClust, with the objective of extracting a set of co-clusters from a categorical data set, without the guarantees of an enumerative algorithm, but with the compromise of scalability. This is done by using a probabilistic clustering algorithm, named Locality Sensitive Hashing, together with the enumerative algorithm named InClose. The experimental results are competitive when applied to labeled categorical data sets and text corpora. Additionally, it is shown that the extracted co-clusters can be of practical use to expert systems such as Recommender Systems and Topic Extraction.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Expert Systems with Applications - Volume 64, 1 December 2016, Pages 24-35

نویسندگان

FabrÃcio Olivetti de França,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دانلود رایگان مقاله ISI : یک الگوریتم همگروهی مبتنی بر هش برای داده های طبقه بندی شده

دسترسی سریع

ارتباط

English Website