Improving K-means Clustering Using Speculation

Igescu, Stefan; Sanca, Viktor; Zapridou, Eleni; Ailamaki, Anastasia

Igescu, Stefan; Sanca, Viktor; Zapridou, Eleni; Ailamaki, Anastasia

2023

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Résumé

K-means is one of the fundamental unsupervised data clustering and machine learning methods. It has been well studied over the years: parallelized, approximated, and optimized for different cases and applications. With increasingly higher parallelism leading to processing becoming bandwidth-bound, further speeding up iterative k-means would require data reduction using sampling at the cost of accuracy. We examine the use of speculation techniques to expedite the convergence of the iterative k-means algorithm without affecting the accuracy of the results. We achieve this by introducing two cooperative and concurrent phases: one works on the overall input data, and the other speculates and explores the space faster using sampling. At the end of every iteration, the two phases vote and choose the best centroids according to the objective function. Our speculative technique reduces the number of steps needed for convergence without compromising accuracy and, at the same time, provides an effective mechanism to escape local minima without prior initialization cost through resampling.

Détails

Titre Improving K-means Clustering Using Speculation

Auteur(s) Igescu, Stefan ; Sanca, Viktor ; Zapridou, Eleni ; Ailamaki, Anastasia

Publié dans Joint Workshops at 49th International Conference on Very Large Data Bases (VLDBW’23)

Pagination 12

Présenté à Workshop on Applied AI for Database Systems and Applications (AIDB’23), Vancouver, Canada, August 28 - September 1, 2023

Date 2023-08-01

Mots-clés (libres)

K-Means; Speculative Execution; Sampling; AI for DB; Clustering; Parallelization; Algorithm Co-Design; Analytics

Laboratoires DIAS

Le document apparaît dans Production scientifique et compétences > I&C - Faculté Informatique & Communications > IINFCOM > DIAS - Laboratoire de systèmes et applications de traitement de données massives
Publications validées par des pairs
Papiers de conférence
Travail produit à l'EPFL

Date de création de la notice 2023-08-14

Files

Résumé

Détails

PDF