کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4950411 1440639 2017 13 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Scalable real-time classification of data streams with concept drift
ترجمه فارسی عنوان
طبقه بندی زمان واقعی مقیاس پذیری جریان داده ها با راندگی مفهوم
کلمات کلیدی
طبقه بندی جریان اطلاعات موازی، سازگاری با ریاضی مفهوم، جریان داده با سرعت بالا،
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
چکیده انگلیسی


- A real-time data stream classifier adaptive to concept drift and robust to noise.
- A parallel implementation of the real-time data stream classifier.
- A discussion about using open source Big Data technologies for data stream mining.

Inducing adaptive predictive models in real-time from high throughput data streams is one of the most challenging areas of Big Data Analytics. The fact that data streams may contain concept drifts (changes of the pattern encoded in the stream over time) and are unbounded, imposes unique challenges in comparison with predictive data mining from batch data. Several real-time predictive data stream algorithms exist, however, most approaches are not naturally parallel and thus limited in their scalability. This paper highlights the Micro-Cluster Nearest Neighbour (MC-NN) data stream classifier. MC-NN is based on statistical summaries of the data stream and a nearest neighbour approach, which makes MC-NN naturally parallel. In its serial version MC-NN is able to handle data streams, the data does not need to reside in memory and is processed incrementally. MC-NN is also able to adapt to concept drifts. This paper provides an empirical study on the serial algorithm's speed, adaptivity and accuracy. Furthermore, this paper discusses the new parallel implementation of MC-NN, its parallel properties and provides an empirical scalability study.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Future Generation Computer Systems - Volume 75, October 2017, Pages 187-199
نویسندگان
, , , ,