Article ID Journal Published Year Pages File Type
486462 Procedia Computer Science 2013 7 Pages PDF
Abstract

When data are high dimensional and mix-typed while response variable is categorical, an effective executable profile consists of categorical or categorized variables with easily understandable statistics. Many data mining technologies require categor- ical variables; many have better results by changing continuous variables to categorical variables. Discretizing a continuous variable can be accomplished in either a supervised way or an unsupervised or conventional way. We propose a supervised discretizing method using the Goodman-Kruskal tau (or GK-τ) maximization as the discretization optimization criterion. This optimization is probabilistic averaging effect oriented. An experiment with financial loan application is designed to show the improvement after the discretization. Some technical concerns during the discretization are discussed in this article as well.

Related Topics
Physical Sciences and Engineering Computer Science Computer Science (General)