کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
383105 660802 2016 12 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Using dynamical systems tools to detect concept drift in data streams
ترجمه فارسی عنوان
استفاده از ابزارهای سیستم های دینامیکی برای تشخیص انحراف مفهوم در جریان داده
کلمات کلیدی
انحراف مفهوم؛ جریان داده ها؛ سیستم های دینامیکی؛ هرج و مرج؛ آموزش بدون نظارت
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی


• New approach to detect concept drifts on data streams;
• Our approach using dynamical systems and chaos theory surpasses traditional ones;
• The modeling of determinism and stochasticity improves concept drift detection;
• Results confirm proposed algorithms detect most of the behavior changes;
• Proposed algorithms have overcome traditional ones from literature.

Real-world data streams may change their behaviors along time, what is referred to as concept drift. By detecting those changes, researchers obtain relevant information about the phenomena that produced such streams (e.g. temperatures in a region, bacteria population, disease occurrence, etc.). Many concept drift detection algorithms consider supervised or semi-supervised approaches which tend to be unfeasible when data is collected at high frequencies, due to the difficulties involved in labeling. Complementarily, current studies usually assume data as statistically independent and identically distributed, disregarding any temporal relationship among observations and, consequently, risking the quality of data modeling. In order to tackle both aspects, we employ dynamical system modeling to represent the temporal relationships among data observations and how they modify along time in attempt to detect concept drift. This approach considers Taken’s immersion theorem to unfold consecutive windows of data observations into the phase space in attempt to represent and compare time dependencies. From this perspective, we proposed four new concept drift detection algorithms based on the unsupervised machine learning paradigm. The first algorithm builds dendrograms of consecutive phase spaces (every phase space represents the time relationships for the observations contained in a particular data window) and compare them out by using the Gromov–Hausdorff distance, providing enough guarantees to detect concept drifts. The second algorithm employs the Cross Recurrence Plot and the Recurrence Quantification Analysis to detect relevant changes in consecutive phase spaces and warn about relevant data modifications. We also preprocess data windows by considering the Empirical Mode Decomposition method and Mutual Information in attempt to take only the deterministic stream behavior into account. All algorithms were implemented as plugins for the Massive Online Analysis (MOA) software and then compared to well-known algorithms from literature. Results confirm the proposed algorithms were capable of detecting most of the behavior changes, creating few false alarms.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Expert Systems with Applications - Volume 60, 30 October 2016, Pages 39–50
نویسندگان
, , ,