دانلود رایگان مقاله: یک رویکرد MapReduce برای کاهش پارامترهای عدم تعادل برای مجموعه داده های بزرگ دی اکسید ریبونوکلئیک اسید

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
469075	698284	2016	16 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

A MapReduce approach to diminish imbalance parameters for big deoxyribonucleic acid dataset

ترجمه فارسی عنوان

یک رویکرد MapReduce برای کاهش پارامترهای عدم تعادل برای مجموعه داده های بزرگ دی اکسید ریبونوکلئیک اسید

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

MapReduce؛ K نزدیک ترین همسایه؛ اطلاعات بزرگ؛ DNA (deoxyribonucleic acid)؛ زیست شناسی محاسباتی؛ داده های عدم تعادل

imbalance data k-nearest neighbor - K نزدیکترین همسایه Computational biology - زیست‌شناسی محاسباتی MapReduce - نگاشت کاهش Big Data - کلان داده

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر علوم کامپیوتر (عمومی)

پیش نمایش مقاله

یک رویکرد MapReduce برای کاهش پارامترهای عدم تعادل برای مجموعه داده های بزرگ دی اکسید ریبونوکلئیک اسید

چکیده انگلیسی

• Imbalanced data sets are considered a special case for the classification problems.
• Map reducing with prototype reduction can easily handle large scale data set with good speedup and less time consuming.
• For high quality, four reduction types: data cleaning, rule aggregation, rule synthesis and rule update were compared.
• A real DNA dataset consists of 90 million pair has been used with reduction types.
• The proposed MapReduce based K-NN classifier reduced the imbalance data set and achieved accurate results for the DNA data.

BackgroundIn the age of information superhighway, big data play a significant role in information processing, extractions, retrieving and management. In computational biology, the continuous challenge is to manage the biological data. Data mining techniques are sometimes imperfect for new space and time requirements. Thus, it is critical to process massive amounts of data to retrieve knowledge. The existing software and automated tools to handle big data sets are not sufficient. As a result, an expandable mining technique that enfolds the large storage and processing capability of distributed or parallel processing platforms is essential.MethodIn this analysis, a contemporary distributed clustering methodology for imbalance data reduction using k-nearest neighbor (K-NN) classification approach has been introduced. The pivotal objective of this work is to illustrate real training data sets with reduced amount of elements or instances. These reduced amounts of data sets will ensure faster data classification and standard storage management with less sensitivity. However, general data reduction methods cannot manage very big data sets. To minimize these difficulties, a MapReduce-oriented framework is designed using various clusters of automated contents, comprising multiple algorithmic approaches.ResultsTo test the proposed approach, a real DNA (deoxyribonucleic acid) dataset that consists of 90 million pairs has been used. The proposed model reduces the imbalance data sets from large-scale data sets without loss of its accuracy.ConclusionsThe obtained results depict that MapReduce based K-NN classifier provided accurate results for big data of DNA.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computer Methods and Programs in Biomedicine - Volume 131, July 2016, Pages 191–206

نویسندگان

Sarwar Kamal, Shamim Hasnat Ripon, Nilanjan Dey, Amira S. Ashour, V. Santhi,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دانلود رایگان مقاله ISI : یک رویکرد MapReduce برای کاهش پارامترهای عدم تعادل برای مجموعه داده های بزرگ دی اکسید ریبونوکلئیک اسید

دسترسی سریع

ارتباط

English Website