کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
465782 697691 2014 16 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
On classification in the case of a medical data set with a complicated distribution
ترجمه فارسی عنوان
در طبقه بندی در مورد داده های پزشکی مجموعه ای با توزیع پیچیده است
کلمات کلیدی
داده کاوی، تمیز کردن داده ها، توزیع داده های پیچیده، طبقه بندی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر علوم کامپیوتر (عمومی)
چکیده انگلیسی

In one of our earlier studies we noticed how straightforward cleaning of our medical data set impaired its classification results considerably with some machine learning methods, but not all of them, unexpectedly and against intuition compared to the original situation without any data cleaning. After a more precise exploration of the data, we found that the reason was the complicated variable distribution of the data although there were only two classes in it. In addition to a straightforward data cleaning method, we used an efficient way called neighbourhood cleaning that solved the problem and improved our classification accuracies 5–10%, at their best, up to 95% of all test cases. This shows how important it is first very carefully to study distributions of data sets to be classified and use different cleaning techniques in order to obtain best classification results.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Applied Computing and Informatics - Volume 10, Issues 1–2, January 2014, Pages 52–67
نویسندگان
, , , ,