A multi-agent reinforcement learning approach to obtaining dynamic control policies for stochastic lot scheduling problem

Article ID	Journal	Published Year	Pages	File Type
10348782	Simulation Modelling Practice and Theory	2005	18 Pages	PDF

Abstract

This paper presents a methodology that, for the problem of scheduling of a single server on multiple products, finds a dynamic control policy via intelligent agents. The dynamic (state dependent) policy optimizes a cost function based on the WIP inventory, the backorder penalty costs and the setup costs, while meeting the productivity constraints for the products. The methodology uses a simulation optimization technique called Reinforcement Learning (RL) and was tested on a stochastic lot-scheduling problem (SELSP) having a state-action space of size 1.8Â ÃÂ 107. The dynamic policies obtained through the RL-based approach outperformed various cyclic policies. The RL approach was implemented via a multi-agent control architecture where a decision agent was assigned to each of the products. A Neural Network based approach (least mean square (LMS) algorithm) was used to approximate the reinforcement value function during the implementation of the RL-based methodology. Finally, the dynamic control policy over the large state space was extracted from the reinforcement values using a commercially available tree classifier tool.

Keywords

Scheduling Simulation optimization Reinforcement learning