Abstract

In this work, we introduce a setup where a monitoring entity attempts to distinguish a cheating player among a group of regular players where all players behave in order to maximize their reward. We assume that the cheating player has an "information advantage" compared to the regular players. However, greedily exploiting this advantage will lead to the cheating player being easily distinguishable from its peers. Hence there is a tension between exploitation of the said advantage and the probability of being caught. We characterize this trade-off showing that the cheating player can obtain a higher reward as the number of regular players grows. We also show that, under a certain regime, a monitoring strategy based on the empirical divergence function attains the same normalized reward as the minimax reward.

Details