Files

Abstract

In this thesis we present and analyze approximation algorithms for three different clustering problems. The formulations of these problems are motivated by fairness and explainability considerations, two issues that have recently received attention in the algorithms and machine learning communities. In general, we are given a metric space and the task is to find a specified number of clusters to minimize some objective involving the centers of the clusters. Our first problem is the colorful k-center problem. In the classic k-center with outliers problem, the objective is to minimize the maximum distance from any point to its nearest center, while a given number of points can be omitted from contributing to the maximum. That is, these points are not served by any center. Colorful k-center is a generalization of this problem. Instead of only serving enough points, each point belongs to some color class and a certain number of points of each color are required to be served in a solution. When this problem was first introduced by [Ban+19], a pseudo-approximation using k + omega - 1 centers was given for general metric spaces where omega is the number of color classes. We give the first true approximation algorithm with k centers that gives a 3-approximation for colorful k-center. Next, we present our progress on the non-uniform k-center problem (NUkC or t-NUkC). In NUkC, we are given pairs (ki, ri), 1<=i<=t, sum ki =k, that represent ki balls of radius ri. The objective is to find Pt ki centers so that balls of radius alpha ri around ki of these centers cover all the points and alpha is minimized. NUkC was introduced by [CGK16], who conjectured that there exists a constant-factor approximation when t is constant. This is known as the NUkC conjecture. In a subsequent work, [CN21] gave a constant-factor approximation for t = 2 with outliers (robust 2-NUkC). We make progress on the NUkC conjecture by presenting a simple reduction of t-NUkC to robust (t-1)-NUkC, which, in light of [CN21], gives the first constant-factor approximation to 3-NUkC. Finally, we look at explainable algorithms for the k-medians and k-means objectives. Small decision trees are considered to be explainable models [Mur+19] for clustering, and in particular, when each node of the binary tree is a threshold on a single feature. Work by [Das+20] studied the theoretical guarantees obtainable from such a threshold tree, which they called the price of explainability. That is, the cost of an explainable clustering is compared to the cost of an optimal unrestricted clustering. We improve upon the results of [Das+20] to give nearly tight bounds on the price of explainability of the k-medians and k-means objectives, which also generalize to pth powers of lp norms.

Details

PDF