Title: Policy gradient methods find the Nash equilibrium in $N$-player general-sum linear-quadratic games
Authors: Ben Hambly - University of Oxford (United Kingdom)
Renyuan Xu - University of Oxford (United Kingdom)
Huining Yang - University of Oxford (United Kingdom) [presenting]
Abstract: Policy optimization algorithms have achieved substantial empirical successes in addressing a variety of non-cooperative multi-agent problems, including self-driving vehicles, real-time bidding games, and optimal execution in financial markets. However, there have been few results from a theoretical perspective showing why such a class of reinforcement learning algorithms performs well with the presence of competition among agents. We explore the natural policy gradient method for a class of $N$-agent general-sum linear-quadratic games. We provide a global linear convergence guarantee for this approach in the setting of finite time horizon and stochastic dynamics when there is a certain level of noise in the system. The noise can either come from the underlying dynamics or carefully designed explorations from the agents. We illustrate our results with numerical experiments to show that even in situations where the policy gradient method may not converge in the deterministic setting, the addition of noise leads to convergence.