کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4949270 1440043 2017 39 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Gradient boosting for high-dimensional prediction of rare events
ترجمه فارسی عنوان
تقویت گرادیان برای پیش بینی های حادثه ای نادرست
کلمات کلیدی
تقویت گرادیان، تعصب حاد نادر، تنظیم مجدد از طریق انقباض و نمونه برداری، طبقه بندی گروهی، پیشبینی کلاس بالایی،
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
چکیده انگلیسی
In clinical research the goal is often to correctly estimate the probability of an event. For this purpose several characteristics of the patients are measured and used to develop a prediction model which can be used to predict the class membership for future patients. Ensemble classifiers are combinations of many different classifiers and they can be useful because combining a set of classifiers can result in more accurate predictions. Gradient boosting is an ensemble classifier which was shown to perform well in the setting where the number of variables exceeds the number of samples (high-dimensional data), however it has not been evaluated for the prediction of rare events. It is demonstrated that Gradient boosting suffers from severe rare events bias, correctly classifying only a small proportion of samples from the rare class. The bias can be removed by using subsampling in combination with appropriate amount of shrinkage but only for a specific number of boosting iterations and for binomial loss function. It is shown that the number of boosting iterations where the rare events bias is removed cannot be estimated efficiently from the training data when the sample size is small. Therefore several corrections for the rare events bias of Gradient boosting are proposed and evaluated by using simulated and real high-dimensional data. It is demonstrated that the proposed corrections successfully remove the rare events bias and outperform the other ensemble classifiers that were considered. Large flexibility and high interpretability of the proposed methods is also illustrated.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computational Statistics & Data Analysis - Volume 113, September 2017, Pages 19-37
نویسندگان
, ,