کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
532130 869910 2014 14 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
New and efficient DCA based algorithms for minimum sum-of-squares clustering
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر چشم انداز کامپیوتر و تشخیص الگو
پیش نمایش صفحه اول مقاله
New and efficient DCA based algorithms for minimum sum-of-squares clustering
چکیده انگلیسی


• The purpose of this paper is to develop new efficient approaches based on DC programming and DCA for clustering.
• We consider the two models for the Minimum Sum-of-Squares Clustering (MSSC): a mixed integer program and a bilevel programming formulation.
• The mixed integer program is reformulated as a DC program via an exact penalty technique for which DCA is investigated.
• A Gaussian kernel version of the bilevel programming formulation is introduced and a simple and efficient DCA scheme is developed.
• The two main features of DCA, say the effect of DC decomposition and of starting points, are carefully studied.

The purpose of this paper is to develop new efficient approaches based on DC (Difference of Convex functions) programming and DCA (DC Algorithm) to perform clustering via minimum sum-of-squares Euclidean distance. We consider the two most widely used models for the so-called Minimum Sum-of-Squares Clustering (MSSC in short) that are a bilevel programming problem and a mixed integer program. Firstly, the mixed integer formulation of MSSC is carefully studied and is reformulated as a continuous optimization problem via a new result on exact penalty technique in DC programming. DCA is then investigated to the resulting problem. Secondly, we introduce a Gaussian kernel version of the bilevel programming formulation of MSSC, named GKMSSC. The GKMSSC problem is formulated as a DC program for which a simple and efficient DCA scheme is developed. A regularization technique is investigated for exploiting the nice effect of DC decomposition and a simple procedure for finding good starting points of DCA is developed. The proposed DCA schemes are original and very inexpensive because they amount to computing, at each iteration, the projection of points onto a simplex and/or onto a ball, and/or onto a box, which are all determined in the explicit form. Numerical results on real word datasets show the efficiency, the scalability of DCA and its great superiority with respect to k-means and kernel k-means, standard methods for clustering.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Pattern Recognition - Volume 47, Issue 1, January 2014, Pages 388–401
نویسندگان
, , ,