Files

Abstract

This paper addresses the issue of interpretability and auditability of reinforcement-learning agents employed in the recovery of unsecured consumer debt. To this end, we develop a deterministic policy-gradient method that allows for a natural integration of domain expertise into the learning procedure so as to encourage learning of consistent, and thus interpretable, policies. Domain knowledge can often be expressed in terms of policy monotonicity and/or convexity with respect to relevant state inputs. We augment the standard actor–critic policy approximator using a monotonically regularized loss function which integrates domain expertise into the learning. Our formulation overcomes the challenge of learning interpretable policies by constraining the search to policies satisfying structural-consistency properties. The resulting state-feedback control laws can be readily understood and implemented by human decision makers. This new domain-knowledge enhanced learning approach is applied to the problem of optimal debt recovery which features a controlled Hawkes process and an asynchronous action–feedback relationship.

Details

PDF