Empirical explorations of strategic reinforcement learning: a case study in the sorting problem; pp. 186–196
PDF | 10.3176/proc.2020.3.02

Ching-Sheng Lin, Jung-Sing Jwo, Cheng-Hsiung Lee, Ya-Ching Lo

Recent advances in deep learning and reinforcement learning have made it possible to create an agent that is capable of mimicking human behaviours. In this paper, we are interested in how the reinforcement learning agent behaves under different learning strategies and whether it is able to complete the task similar to human performance in principle. To study the effect of different reward types, two reward schemes which include immediate reward and pure-delayed reward are introduced. To build a more human-like agent when interacting with the environment, we propose a goal-driven design that forces the agent to achieve a level close to human ability and a training mechanism that learns only from good trajectories. Q-learning is one of the most popular reinforcement learning algorithms and we employ it for our study. As the sorting problem is a classical topic in theoretical computer science with widespread applications, it is used for the empirical evaluation. We compare our results against the algorithmic solutions.


