دانلود رایگان مقاله: بررسی مسائل مربوط به عدم تعادل داده ها آموزش و برچسب اشتباه در عملکرد جنگل های تصادفی برای طبقه بندی پوشش زمین های بزرگ با استفاده از حاشیه گروه

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
555961	1451269	2015	14 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

Exploring issues of training data imbalance and mislabelling on random forest performance for large area land cover classification using the ensemble margin

ترجمه فارسی عنوان

بررسی مسائل مربوط به عدم تعادل داده ها آموزش و برچسب اشتباه در عملکرد جنگل های تصادفی برای طبقه بندی پوشش زمین های بزرگ با استفاده از حاشیه گروه

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

حاشیه گروه، داده های آموزشی، طبقه بندی، سنجش از دور، عدم تعادل، برچسب نادرست

Mislabelling - برچسب نادرست Training data - داده های آموزشی Remote sensing - سنجش از راه دور Classification - طبقه بندی Imbalance - عدم تعادل

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر سیستم های اطلاعاتی

پیش نمایش مقاله

بررسی مسائل مربوط به عدم تعادل داده ها آموزش و برچسب اشتباه در عملکرد جنگل های تصادفی برای طبقه بندی پوشش زمین های بزرگ با استفاده از حاشیه گروه

چکیده انگلیسی

Studies have demonstrated the robust performance of the ensemble machine learning classifier, random forests, for remote sensing land cover classification, particularly across complex landscapes. This study introduces new ensemble margin criteria to evaluate the performance of Random Forests (RF) in the context of large area land cover classification and examines the effect of different training data characteristics (imbalance and mislabelling) on classification accuracy and uncertainty. The study presents a new margin weighted confusion matrix, which used in combination with the traditional confusion matrix, provides confidence estimates associated with correctly and misclassified instances in the RF classification model. Landsat TM satellite imagery, topographic and climate ancillary data are used to build binary (forest/non-forest) and multiclass (forest canopy cover classes) classification models, trained using sample aerial photograph maps, across Victoria, Australia. Experiments were undertaken to reveal insights into the behaviour of RF over large and complex data, in which training data are not evenly distributed among classes (imbalance) and contain systematically mislabelled instances. Results of experiments reveal that while the error rate of the RF classifier is relatively insensitive to mislabelled training data (in the multiclass experiment, overall 78.3% Kappa with no mislabelled instances to 70.1% with 25% mislabelling in each class), the level of associated confidence falls at a faster rate than overall accuracy with increasing amounts of mislabelled training data. In general, balanced training data resulted in the lowest overall error rates for classification experiments (82.3% and 78.3% for the binary and multiclass experiments respectively). However, results of the study demonstrate that imbalance can be introduced to improve error rates of more difficult classes, without adversely affecting overall classification accuracy.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: ISPRS Journal of Photogrammetry and Remote Sensing - Volume 105, July 2015, Pages 155–168

نویسندگان

Andrew Mellor, Samia Boukir, Andrew Haywood, Simon Jones,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دسترسی سریع

ارتباط

English Website