Abstract
In this paper, we revisit a suite of popular proximity problems (such as, KNN, clustering, minimum spanning tree) that repeatedly perform distance computations to compare distances during their execution. Our effort here is to design principled solutions to minimize distance computations for such problems in general metric spaces, especially for the scenarios where calling an expensive oracle to resolve unknown distances are the dominant cost of the algorithms for these problems. We present a suite of techniques, including a novel formulation of the problem, that studies how distance comparisons between objects could be modelled as a system of linear inequalities that assists in saving distance computations, multiple graph based solutions, as well as a practitioners guide to adopt our solution frameworks to proximity problems. We compare our designed solutions conceptually and empirically with respect to a broad range of existing works. We finally present a comprehensive set of experimental results using multiple large scale real-world datasets and a suite of popular proximity algorithms to demonstrate the effectiveness of our proposed approaches.
Original language | English (US) |
---|---|
Pages (from-to) | 142-154 |
Number of pages | 13 |
Journal | Proceedings of the ACM SIGMOD International Conference on Management of Data |
DOIs | |
State | Published - 2021 |
Externally published | Yes |
Event | 2021 International Conference on Management of Data, SIGMOD 2021 - Virtual, Online, China Duration: Jun 20 2021 → Jun 25 2021 |
All Science Journal Classification (ASJC) codes
- Software
- Information Systems
Keywords
- clustering
- k-nearest neighbour graphs
- lower bound computation
- metric space
- minimum spanning trees
- proximity problems