Files

Résumé

In this work, we introduce a setup where a monitoring entity attempts to distinguish a cheating player among a group of regular players where all players behave in order to maximize their reward. We assume that the cheating player has an ``information advantage'' compared to the regular players. However, greedily exploiting this advantage will lead to the cheating player being easily distinguishable from its peers. Hence there is a tension between exploitation of the said advantage and the probability of being caught. We characterize this trade-off showing that the cheating player can obtain a higher reward as the number of regular players grows. We also show that, under a certain regime, a monitoring strategy based on the empirical divergence function attains the same normalized reward as the minimax reward.

Détails

PDF