Powered embarrassing parallel MCMC sampling in Bayesian inference, a weighted average intuition

Article ID	Journal	Published Year	Pages	File Type
4949233	Computational Statistics & Data Analysis	2017	10 Pages	PDF

Abstract

Although the Markov Chain Monte Carlo (MCMC) is very popular in parameter inference, the alleviation of the burden of calculation is crucial due to the limit of processors, memory, and disk bottleneck. This is especially true in terms of handling big data. In recent years, researchers have developed a parallel MCMC algorithm, in which full data are partitioned into subdatasets. Samples are drawn from the subdatasets independently at different machines without communication. In the extant literature, all machines are deemed to be identical. However, due to the heterogeneity of the data put into different machines, and the random nature of MCMC, the assumption of “identical machines” is questionable. Here we propose a Powered Embarrassing Parallel MCMC (PEPMCMC) algorithm, in which the full data posterior density is the product of the sub-posterior densities (posterior densities of different subdatasets) raised by some constraint powers. This is proven to be equivalent to a weighted averaging procedure. In our work, the powers are determined based on a maximum likelihood criterion, which leads to finding a maximum likelihood point within the convex hull of the estimates from different machines. We prove the asymptotic exactness and apply it to several cases to verify its strength in comparison with the unparallel and unpowered parallel algorithms. Furthermore, the connection between normal kernel density and parametric density estimations under certain conditions is investigated.

Keywords

Maximum likelihood Markov chain Monte Carlo Weighted average