ACE: Algorithm-Independent Acceleration and Parallelization of Clustering Implementations

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Clustering is a key technique in a wide range of data analysis tasks. However, algorithms that ensure stable, deterministic, accurate clustering are computationally expensive, having superlinear complexity for both memory and time. Therefore, even when using HPC hardware, there are hard limits to dataset sizes that can be clustered, as clustering implementations can run out of memory or take unacceptably long. We introduce an approach called ACE  that applies algorithm-independent, black-box parallelization to superlinear sequential clustering algorithms, thereby making the clustering of substantial datasets feasible, even on commodity desktop/laptop systems. ACE starts by partitioning data to fit onto a given machine, and via divide-and-conquer, reduce the complexity of clustering steps. Next, ACE uses parallel, automated hyperparameter search to find optimal parameters for the current dataset. Finally, ACE aggregates intermediate results effectively and efficiently so that the final clustering output does not sacrifice clustering quality compared to the original algorithm. An evaluation on four popular clustering algorithms – Affinity Propagation, DBSCAN, Hierarchical Agglomerative Clustering, and Spectral Clustering – shows that ACE substantially reduces memory requirements and achieves linear processing time. ACE was able to process an entire suite of 164 datasets, including substantial datasets with 1.4M points or 1,000 dimensions, whereas the default implementations failed to process between 15 and 149 datasets from the suite. Moreover, for those datasets that could be processed by the default implementations, ACE achieved a 1.13x–102x time reduction.

Original languageEnglish (US)
Title of host publicationParallel Processing and Applied Mathematics - 15th International Conference, PPAM 2024, Revised Selected Papers
EditorsRoman Wyrzykowski, Jack Dongarra, Ewa Deelman, Konrad Karczewski
PublisherSpringer Science and Business Media Deutschland GmbH
Pages161-176
Number of pages16
ISBN (Print)9783031856969
DOIs
StatePublished - 2025
Event15th International Conference on Parallel Processing and Applied Mathematics, PPAM 2024 - Ostrava, Czech Republic
Duration: Sep 8 2024Sep 11 2024

Publication series

NameLecture Notes in Computer Science
Volume15579
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference15th International Conference on Parallel Processing and Applied Mathematics, PPAM 2024
Country/TerritoryCzech Republic
CityOstrava
Period9/8/249/11/24

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • General Computer Science

Keywords

  • Automatic Parallelization
  • Clustering
  • Machine Learning

Fingerprint

Dive into the research topics of 'ACE: Algorithm-Independent Acceleration and Parallelization of Clustering Implementations'. Together they form a unique fingerprint.

Cite this