کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
1180835 1491543 2014 16 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Visualizing Big data with Compressed Score Plots: Approach and research challenges
ترجمه فارسی عنوان
تجسم داده های بزرگ با اسکناس های فشرده: چالش های رویکرد و پژوهش
کلمات کلیدی
تجزیه و تحلیل داده های اکتشافی، اطلاعات بزرگ، تجزیه و تحلیل اجزای اصلی، کمترین مربعات جزئی، قطعه امتیاز
موضوعات مرتبط
مهندسی و علوم پایه شیمی شیمی آنالیزی یا شیمی تجزیه
چکیده انگلیسی


• Score plots are extended to visualize Big data
• A procedure to compute Compressed Score Plots based on clustering is defined
• A procedure to update Compressed Score Plots based on approximations is defined
• The framework allows to explore millions of observations in seconds
• The framework is demonstrated with several real data sets

Exploratory Data Analysis (EDA) can be defined as the initial exploration of a data set with the aim of generating a hypothesis of interest. Projection models based on latent structures and associated visualization techniques are valuable tools within EDA. In particular, score plots are a main tool to discover patterns in the observations. This paper addresses the extension of score plots to very large data sets, with an unlimited number of observations. The proposed solution, based on clustering and approximation techniques, is referred to as the Compressed Score Plots (CSPs). The approach is presented to deal with high volume data sets and high velocity data streams. The objective is to retain the visualization capabilities of traditional score plots while making the user-supervised analysis of huge data sets affordable in a similar time scale to that of low size data sets. Efficient processing and updating approaches, visualization techniques, performance measures and challenges for future research are identified throughout the paper. The approach is illustrated with several data sets, including a data set of five million observations and more than one hundred variables.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Chemometrics and Intelligent Laboratory Systems - Volume 135, 15 July 2014, Pages 110–125
نویسندگان
,