Supervised multidimensional scaling for visualization, classification, and bipartite ranking

Article ID	Journal	Published Year	Pages	File Type
417712	Computational Statistics & Data Analysis	2011	13 Pages	PDF

Abstract

Least squares multidimensional scaling (MDS) is a classical method for representing a n×nn×n dissimilarity matrix D. One seeks a set of configuration points z1,…,zn∈RS such that D is well approximated by the Euclidean distances between the configuration points: Dij≈‖zi−zj‖2. Suppose that in addition to D, a vector of associated binary class labels y∈{1,2}n corresponding to the nn observations is available. We propose an extension to MDS that incorporates this outcome vector. Our proposal, supervised multidimensional scaling (SMDS), seeks a set of configuration points z1,…,zn∈RS such that Dij≈‖zi−zj‖2, and such that zis>zjszis>zjs for s=1,…,Ss=1,…,S tends to occur when yi>yjyi>yj. This results in a new way to visualize the observations. In addition, we show that SMDS leads to a method for the classification of test observations, which can also be interpreted as a solution to the bipartite ranking problem. This method is explored in a simulation study, as well as on a prostate cancer gene expression data set and on a handwritten digits data set.