کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
382456 660763 2016 14 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Data heterogeneity consideration in semi-supervised learning
ترجمه فارسی عنوان
نگرش داده ها در یادگیری نیمه نظارتی
کلمات کلیدی
یادگیری نیمه نظارتی، ساخت گراف، شبکه های پیچیده انتخاب نمایندگان، تجزیه و تحلیل اجزای اصلی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی


• Data heterogeneity nature is considered in machine learning context.
• An adaptive data graph construction method is proposed.
• Representative data identification has been studied.

In class (cluster) formation process of machine learning techniques, data instances are usually assumed to have equal relevance. However, it is frequently not true. Such a situation is more typical in semi-supervised learning since we have to understand the data structure of both labeled and unlabeled data at the same time. In this paper, we investigate the organizational heterogeneity of data in semi-supervised learning using graph representation. This is because graph is a natural choice to characterize relationship between any pair of nodes or any pair of groups of nodes, consequently, strategical location of each node or each group of nodes can be determined by graph measures. Specifically, two issues are addressed: (1) We propose an adaptive graph construction method, we call AdaRadius, considering the heterogeneity of local interacting structure among nodes. As a result, it presents several interesting properties, namely adaptability to data density variations, low dependency on parameters setting, and reasonable computational cost, for both pool based and incremental data. (2) Moreover, we present heuristic criteria for selecting representative data samples to be labeled. Experimental study shows that selective labeling usually gets better classification results than random labeling. To our knowledge, it still lacks investigation on both issues up to now, therefore, our approach presents an important step toward the data heterogeneity characterization not only in semi-supervised learning, but also in general machine learning.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Expert Systems with Applications - Volume 45, 1 March 2016, Pages 234–247
نویسندگان
, ,