Optimally solving Markov decision processes with total expected discounted reward function: Linear programming revisited

Article ID	Journal	Published Year	Pages	File Type
1133566	Computers & Industrial Engineering	2015	6 Pages	PDF

Abstract

•Compared computational performance of linear programming and the policy iteration.•Considered only discrete-time infinite-horizon MDPs with discounted reward.•Used randomly generated test problems and a real-life health-care problem.•Showed that, unlike previously reported, barrier methods for LP provide a viable tool.•LP is more effective when the transition probability matrix has a diagonal structure.

We compare the computational performance of linear programming (LP) and the policy iteration algorithm (PIA) for solving discrete-time infinite-horizon Markov decision process (MDP) models with total expected discounted reward. We use randomly generated test problems as well as a real-life health-care problem to empirically show that, unlike previously reported, barrier methods for LP provide a viable tool for optimally solving such MDPs. The dimensions of comparison include transition probability matrix structure, state and action size, and the LP solution method.

Keywords

MDP Linear programming Treatment optimization Policy iteration Markov decision process