Fairness and Explainability in Clustering Problems

Jia, Xinrui

doi:10.5075/epfl-thesis-9818

Jia, Xinrui

2023

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

In this thesis we present and analyze approximation algorithms for three different clustering problems. The formulations of these problems are motivated by fairness and explainability considerations, two issues that have recently received attention in the algorithms and machine learning communities. In general, we are given a metric space and the task is to find a specified number of clusters to minimize some objective involving the centers of the clusters. Our first problem is the colorful k-center problem. In the classic k-center with outliers problem, the objective is to minimize the maximum distance from any point to its nearest center, while a given number of points can be omitted from contributing to the maximum. That is, these points are not served by any center. Colorful k-center is a generalization of this problem. Instead of only serving enough points, each point belongs to some color class and a certain number of points of each color are required to be served in a solution. When this problem was first introduced by [Ban+19], a pseudo-approximation using k + omega - 1 centers was given for general metric spaces where omega is the number of color classes. We give the first true approximation algorithm with k centers that gives a 3-approximation for colorful k-center. Next, we present our progress on the non-uniform k-center problem (NUkC or t-NUkC). In NUkC, we are given pairs (ki, ri), 1<=i<=t, sum ki =k, that represent ki balls of radius ri. The objective is to find Pt ki centers so that balls of radius alpha ri around ki of these centers cover all the points and alpha is minimized. NUkC was introduced by [CGK16], who conjectured that there exists a constant-factor approximation when t is constant. This is known as the NUkC conjecture. In a subsequent work, [CN21] gave a constant-factor approximation for t = 2 with outliers (robust 2-NUkC). We make progress on the NUkC conjecture by presenting a simple reduction of t-NUkC to robust (t-1)-NUkC, which, in light of [CN21], gives the first constant-factor approximation to 3-NUkC. Finally, we look at explainable algorithms for the k-medians and k-means objectives. Small decision trees are considered to be explainable models [Mur+19] for clustering, and in particular, when each node of the binary tree is a threshold on a single feature. Work by [Das+20] studied the theoretical guarantees obtainable from such a threshold tree, which they called the price of explainability. That is, the cost of an explainable clustering is compared to the cost of an optimal unrestricted clustering. We improve upon the results of [Das+20] to give nearly tight bounds on the price of explainability of the k-medians and k-means objectives, which also generalize to pth powers of lp norms.

Details

Title Fairness and Explainability in Clustering Problems

Author(s) Jia, Xinrui

Advisor(s)

Svensson, Ola Nils Anders

Pagination 110

Date 2023

Publisher Lausanne, EPFL

Keywords

k-center clustering; k-medians clustering; k-means clustering; fairness; interpretability; explainability; approximation algorithms

Language English

DOI https://doi.org/10.5075/epfl-thesis-9818

Laboratories THL2

Record Appears in Scientific production and competences > I&C - School of Computer and Communication Sciences > IINFCOM > THL2 - Theory of Computation Laboratory 2
Scientific production and competences > EPFL Theses
Work produced at EPFL
Published
Theses

Record creation date 2023-07-24

Files

Abstract

Details

PDF