Supervised Dirichlet Process Mixtures of Principal Component Analysis

Article ID	Journal	Published Year	Pages	File Type
6863774	Neurocomputing	2018	38 Pages	PDF

Abstract

We introduce probabilistic principal component analysis (PPCA) into Dirichlet Process Mixtures of Generalized Linear Models (DPGLM) and propose a new model called Supervised Dirichlet Process Mixtures of Principal Component Analysis (SDPM-PCA). In SDPM-PCA, we assume covariates and response variable are generated separately through the latent variable of PPCA, and nonparametrically modeled using the Dirichlet Process Mixture. By jointly learning the latent variable, cluster label and response variable, SDPM-PCA performs locally dimensionality reduction within each mixture component, and learns a supervised model based on the latent variable. In this way, SDPM-PCA improves the performance of both dimensionality reduction and prediction on high-dimensional data with all advantages of DPGLM. We also develop an inference algorithm for SDPM-PCA based on variational inference, which provides faster training speed and deterministic approximation compared with sampling algorithms based on MCMC method. Finally, we instantiate SDPM-PCA in regression problem with a Bayesian linear regression model. We test it on several real-world datasets and compare the prediction performance with DPGLM and other regular regression model. Experiment results show that by setting properly latent dimension number, SDPM-PCA would provide better prediction performance on high-dimensional regression problem and avoid the curse of dimensionality problem in DPGLM.

Keywords

Probabilistic principal component analysis Dirichlet process mixtures Variational Inference