Files

Abstract

By incorporating known constraints into the inverse reinforcement learning (IRL) framework, con- strained inverse reinforcement learning (CIRL) can learn behaviors from expert demonstration while satisfying a set of pre-defined constraints. This makes CIRL relevant in safety-critical domains, as it provides a direct way to devise AI systems that enforce safety requirements. This master the- sis proposes and analyzes an algorithm, termed NPG-CIRL, that solves the problem of CIRL. Our algorithm implements a primal-dual scheme that extends the natural policy gradient (NPG) algo- rithm to the CIRL setting. We provide a finite-time analysis of the algorithm’s global convergence in the idealized exact gradient setting and the more practical stochastic gradient setting. We show that the algorithm requires $O(1/ε^2)$ gradient evaluations to reach an ε-approximate solution and to satisfy the imposed constraints. Our analysis also quantifies the sample complexity, showing that the algorithm requires $O(1/ε^4)$ samples to achieve convergence when using Monte Carlo gradient estimation techniques.

Details

PDF