دانلود رایگان مقاله: نمونه برداری موازی از داده های بزرگ با توزیع نامشخص

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
389859	661185	2015	17 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

Parallel sampling from big data with uncertainty distribution

ترجمه فارسی عنوان

نمونه برداری موازی از داده های بزرگ با توزیع نامشخص

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Uncertainty - عدم قطعیت Sampling - نمونه‌برداری MapReduce - نگاشت کاهش

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی

پیش نمایش مقاله

نمونه برداری موازی از داده های بزرگ با توزیع نامشخص

چکیده انگلیسی

Data are inherently uncertain in most applications. Uncertainty is encountered when an experiment such as sampling is to proceed, the result of which is not known to us while leading to variety of potential outcomes. With the rapid developments of data collection and distribution storage technologies, big data have become a bigger-than-ever problem. And dealing with big data with uncertainty distribution is one of the most important issues of big data research. In this paper, we propose a Parallel Sampling method based on Hyper Surface for big data with uncertainty distribution, namely PSHS, which adopts a universal concept of Minimal Consistent Subset (MCS) of Hyper Surface Classification (HSC). Our inspiration for handling uncertainties in sampling from big data depends on (1) the inherent structure of the original sample set is uncertain for us, (2) boundary set formed of all the possible separating hyper surfaces is a fuzzy set and (3) the uncertainty of elements in MCS. PSHS is implemented based on MapReduce framework, which is a current and powerful parallel programming technique used in many fields. Experiments have been carried out on several data sets including real world data from UCI repository and synthetic data. The results show that our algorithm shrinks data sets while maintaining identical distribution, which is useful for obtaining the inherent structure of the data sets. Furthermore, the evaluation criterions of speedup, scaleup and sizeup validate its efficiency.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Fuzzy Sets and Systems - Volume 258, 1 January 2015, Pages 117–133

نویسندگان

Qing He, Haocheng Wang, Fuzhen Zhuang, Tianfeng Shang, Zhongzhi Shi,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دانلود رایگان مقاله ISI : نمونه برداری موازی از داده های بزرگ با توزیع نامشخص

دسترسی سریع

ارتباط

English Website