کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
476347 699453 2006 17 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Evaluating the performance of cost-based discretization versus entropy- and error-based discretization
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر علوم کامپیوتر (عمومی)
پیش نمایش صفحه اول مقاله
Evaluating the performance of cost-based discretization versus entropy- and error-based discretization
چکیده انگلیسی

Discretization is defined as the process that divides continuous numeric values into intervals of discrete categorical values. In this article, the concept of cost-based discretization as a pre-processing step to the induction of a classifier is introduced in order to obtain an optimal multi-interval splitting for each numeric attribute. A transparent description of the method and the steps involved in cost-based discretization are given. The aim of this paper is to present this method and to assess the potential benefits of such an approach. Furthermore, its performance against two other well-known methods, i.e. entropy- and pure error-based discretization is examined. To this end, experiments on 14 data sets, taken from the UCI Repository on Machine Learning were carried out. In order to compare the different methods, the area under the Receiver Operating Characteristic (ROC) graph was used and tested on its level of significance. For most data sets the results show that cost-based discretization achieves satisfactory results when compared to entropy- and error-based discretization.Statement of scope and purposeGiven its importance, many researchers have already contributed to the issue of discretization in the past. However, to the best of our knowledge, no efforts have been made yet to include the concept of misclassification costs to find an optimal multi-split for discretization purposes, prior to induction of the decision tree. For this reason, this new concept is introduced and explored in this article by means of operations research techniques.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computers & Operations Research - Volume 33, Issue 11, November 2006, Pages 3107–3123
نویسندگان
, , , ,