
A gradient-based reinforcement learning approach to dynamic pricing in partially-observable environments
Keywords: یادگیری تقویتی; Reinforcement learning; Dynamic pricing; Grid; Policy gradient