Abstract

One major challenge in distributed learning is to efficiently learn for each client when the data across clients is heterogeneous or non iid (not independent or identically distributed). This provides a significant challenge as the data of the other clients may not be helpful to each individual client. Thus the following question arises - can each individual client’s performance be improved with access to the data of other clients in this heterogeneous data setting? A further challenge is to have a good personalized model while still maintaining the privacy of local data samples.We consider a model where the client data distributions are not identical and can be dependent. In this heterogeneous data setting we study the problem of distributed learning of data distributions. We propose a personalized linear estimator for each client and show that this estimator is never worse and can be substantially better (up to a factor equal to the number of clients) than the sample mean estimator while still concentrating around the true probability. This estimator can be implemented by privacy-preserving schemes in both the cryptographic and differentially private settings.

Details