کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
478206 1446033 2014 11 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Mining categorical sequences from data using a hybrid clustering method
ترجمه فارسی عنوان
توالی های طبقه بندی معدنی از داده ها با استفاده از یک روش خوشه بندی ترکیبی
کلمات کلیدی
داده کاوی، داده های متوالی، مدل های مخفی مارکوف، خوشه بندی داده های طبقه بندی شده
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر علوم کامپیوتر (عمومی)
چکیده انگلیسی


• We propose an innovative hybrid method for clustering categorical sequential data.
• The algorithm combines model-based and hierarchical clustering approach.
• We are able to cluster sequences without losing information about their dynamics.
• In the first application the method detects 2 distinct web users search patterns.
• In the second application the method recovers 5 clusters with diverse job careers.

The identification of different dynamics in sequential data has become an every day need in scientific fields such as marketing, bioinformatics, finance, or social sciences. Contrary to cross-sectional or static data, this type of observations (also known as stream data, temporal data, longitudinal data or repeated measures) are more challenging as one has to incorporate data dependency in the clustering process. In this research we focus on clustering categorical sequences. The method proposed here combines model-based and heuristic clustering. In the first step, the categorical sequences are transformed by an extension of the hidden Markov model into a probabilistic space, where a symmetric Kullback–Leibler distance can operate. Then, in the second step, using hierarchical clustering on the matrix of distances, the sequences can be clustered. This paper illustrates the enormous potential of this type of hybrid approach using a synthetic data set as well as the well-known Microsoft dataset with website users search patterns and a survey on job career dynamics.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: European Journal of Operational Research - Volume 234, Issue 3, 1 May 2014, Pages 720–730
نویسندگان
, ,