کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
534840 870297 2011 6 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A sparse version of the ridge logistic regression for large-scale text categorization
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر چشم انداز کامپیوتر و تشخیص الگو
پیش نمایش صفحه اول مقاله
A sparse version of the ridge logistic regression for large-scale text categorization
چکیده انگلیسی

The ridge logistic regression has successfully been used in text categorization problems and it has been shown to reach the same performance as the Support Vector Machine but with the main advantage of computing a probability value rather than a score. However, the dense solution of the ridge makes its use unpractical for large scale categorization. On the other side, LASSO regularization is able to produce sparse solutions but its performance is dominated by the ridge when the number of features is larger than the number of observations and/or when the features are highly correlated. In this paper, we propose a new model selection method which tries to approach the ridge solution by a sparse solution. The method first computes the ridge solution and then performs feature selection. The experimental evaluations show that our method gives a solution which is a good trade-off between the ridge and LASSO solutions.

Research highlights
► Ridge penalization for the logistic regression outperforms LASSO in different settings.
► Ridge models do not scale well in terms of documents, features and categories.
► A sparse ridge solution can be obtained by applying a LASSO-like feature selection.
► This version retains only the features strongly supported by the data.
► This new version can efficiently be used in Large Scale Text Categorization.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Pattern Recognition Letters - Volume 32, Issue 2, 15 January 2011, Pages 101–106
نویسندگان
, , , , ,