کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4950512 1440646 2017 37 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Exploratory analysis of textual data streams
ترجمه فارسی عنوان
تجزیه و تحلیل اکتشافی از جریان داده متنی
کلمات کلیدی
خوشه بندی جریان اطلاعات متن، تجزیه و تحلیل اکتشافی، تکامل موضوع، تشخیص موضوعات برجسته
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
چکیده انگلیسی
In this paper, we address exploratory analysis of textual data streams and we propose a bootstrapping process based on a combination of keyword similarity and clustering techniques to: (i) classify documents into fine-grained similarity clusters, based on keyword commonalities; (ii) aggregate similar clusters into larger document collections sharing a richer, more user-prominent keyword set that we call topic; (iii) assimilate newly extracted topics of current bootstrapping cycle with existing topics resulting from previous bootstrapping cycles, by linking similar topics of different time periods, if any, to highlight topic trends and evolution. An analysis framework is also defined enabling the topic-based exploration of the underlying textual data stream according to a thematic perspective and a temporal perspective. The bootstrapping process is evaluated on a real data stream of about 330.000 newspaper articles about politics published by the New York Times from Jan 1st 1900 to Dec 31st 2015.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Future Generation Computer Systems - Volume 68, March 2017, Pages 391-406
نویسندگان
, , ,