Article ID Journal Published Year Pages File Type
478480 European Journal of Operational Research 2011 13 Pages PDF
Abstract

Semiconductor test scheduling problem is a variation of reentrant unrelated parallel machine problems considering multiple resource constraints, intricate {product, tester, kit, enabler assembly} eligibility constraints, sequence-dependant setup times, etc. A multi-step reinforcement learning (RL) algorithm called Sarsa(λ, k) is proposed and applied to deal with the scheduling problem with throughput related objective. Allowing enabler reconfiguration, the production capacity of the test facility is expanded and scheduling optimization is performed at the bottom level. Two forms of Sarsa(λ, k), i.e. forward view Sarsa(λ, k) and backward view Sarsa(λ, k), are constructed and proved equivalent in off-line updating. The upper bound of the error of the action-value function in tabular Sarsa(λ, k) is provided when solving deterministic problems. In order to apply Sarsa(λ, k), the scheduling problem is transformed into an RL problem by representing states, constructing actions, the reward function and the function approximator. Sarsa(λ, k) achieves smaller mean scheduling objective value than the Industrial Method (IM) by 68.59% and 76.89%, respectively for real industrial problems and randomly generated test problems. Computational experiments show that Sarsa(λ, k) outperforms IM and any individual action constructed with the heuristics derived from the existing heuristics or scheduling rules.

► We propose a multi-step reinforcement learning algorithm called Sarsa(λ,k). ► We construct forward view Sarsa(λ,k) and backward view Sarsa(λ,k) and prove their equivalence in off-line updating. ► We provide the upper bound of the error of the action-value function in tabular Sarsa(λ,k) when solving deterministic problems. ► Sarsa(λ,k) outperforms the Industrial Method and any individual action.

Related Topics
Physical Sciences and Engineering Computer Science Computer Science (General)
Authors
, , , ,