Reinforcement learning for joint pricing, lead-time and scheduling decisions in make-to-order systems

Article ID	Journal	Published Year	Pages	File Type
476833	European Journal of Operational Research	2012	11 Pages	PDF

Abstract

The paper investigates a problem faced by a make-to-order (MTO) firm that has the ability to reject or accept orders, and set prices and lead-times to influence demands. Inventory holding costs for early completed orders, tardiness costs for late delivery orders, order rejection costs, manufacturing variable costs, and fixed costs are considered. In order to maximize the expected profits in an infinite planning horizon with stochastic demands, the firm needs to make decisions from the following aspects: which orders to accept or reject, the trade-off between price and lead-time, and the potential for increased demand against capacity constraints. We model the problem as a Semi-Markov Decision Problem (SMDP) and develop a reinforcement learning (RL) based Q-learning algorithm (QLA) for the problem. In addition, we build a discrete-event simulation model to validate the performance of the QLA, and compare the experimental results with two benchmark policies, the First-Come-First-Serve (FCFS) policy and a threshold heuristic policy. It is shown that the QLA outperforms the existing policies.

► In this study, we investigate the joint pricing, lead-time and scheduling decisions simultaneously in MTO systems. ► We model the problem as a Semi-Markov Decision Problem (SMDP). ► We develop a reinforcement learning (RL) based Q-learning algorithm (QLA).

Keywords

Q-learning Reinforcement Learning (RL)Scheduling Simulation-based optimization Pricing