Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
536668 | Pattern Recognition Letters | 2008 | 8 Pages |
In this paper, we compare three different measures for computing Mahalanobis-type distances between random variables consisting of several categorical dimensions or mixed categorical and numeric dimensions – regular simplex, tensor product space, and symbolic covariance. The tensor product space and symbolic covariance distances are new contributions. We test the methods on two application domains – classification and principal components analysis. We find that the tensor product space distance is impractical with most problems. Over all, the regular simplex method is the most successful in both domains, but the symbolic covariance method has several advantages including time and space efficiency, applicability to different contexts, and theoretical neatness.