کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
530786 869788 2012 12 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Hybrid cluster ensemble framework based on the random combination of data transformation operators
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر چشم انداز کامپیوتر و تشخیص الگو
پیش نمایش صفحه اول مقاله
Hybrid cluster ensemble framework based on the random combination of data transformation operators
چکیده انگلیسی

Given a dataset P represented by an n×m matrix (where n is the number of data points and m is the number of attributes), we study the effect of applying transformations to P and how this affects the performance of different ensemble algorithms. Specifically, a dataset P   can be transformed into a new dataset P′P′ by a set of transformation operators ΦΦ in the instance dimension, such as sub-sampling, super-sampling, noise injection, and so on, and a corresponding set of transformation operators ΨΨ in the attribute dimension. Based on these conventional transformation operators ΦΦ and ΨΨ, a general form ΩΩ of the transformation operator is proposed to represent different kinds of transformation operators. Then, two new data transformation operators, known respectively as probabilistic based data sampling operator and probabilistic based attribute sampling operator, are designed to generate new datasets in the ensemble. Next, three new random transformation operators are proposed, which include the random combination of transformation operators in the data dimension, in the attribute dimension, and in both dimensions respectively. Finally, a new cluster ensemble approach is proposed, which integrates the random combination of data transformation operators across different dimensions, a hybrid clustering technique, a confidence measure, and the normalized cut algorithm into the ensemble framework. The experiments show that (i) random combination of transformation operators across different dimensions outperforms most of the conventional data transformation operators for different kinds of datasets. (ii) The proposed cluster ensemble framework performs well on different datasets such as gene expression datasets and datasets in the UCI machine learning repository.


► A new hybrid cluster ensemble framework.
► The random combination of data transformation operators.
► A number of newly designed cluster validity indices.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Pattern Recognition - Volume 45, Issue 5, May 2012, Pages 1826–1837
نویسندگان
, , , , ,