A Generalized Approach for Reducing Expensive Distance Calls for A Broad Class of Proximity Problems

Jees Augustine, Suraj Shetiya, Mohammadreza Esfandiari, Senjuti Basu Roy, Gautam Das

Research output: Contribution to journalConference articlepeer-review

Abstract

In this paper, we revisit a suite of popular proximity problems (such as, KNN, clustering, minimum spanning tree) that repeatedly perform distance computations to compare distances during their execution. Our effort here is to design principled solutions to minimize distance computations for such problems in general metric spaces, especially for the scenarios where calling an expensive oracle to resolve unknown distances are the dominant cost of the algorithms for these problems. We present a suite of techniques, including a novel formulation of the problem, that studies how distance comparisons between objects could be modelled as a system of linear inequalities that assists in saving distance computations, multiple graph based solutions, as well as a practitioners guide to adopt our solution frameworks to proximity problems. We compare our designed solutions conceptually and empirically with respect to a broad range of existing works. We finally present a comprehensive set of experimental results using multiple large scale real-world datasets and a suite of popular proximity algorithms to demonstrate the effectiveness of our proposed approaches.

Original languageEnglish (US)
Pages (from-to)142-154
Number of pages13
JournalProceedings of the ACM SIGMOD International Conference on Management of Data
DOIs
StatePublished - 2021
Externally publishedYes
Event2021 International Conference on Management of Data, SIGMOD 2021 - Virtual, Online, China
Duration: Jun 20 2021Jun 25 2021

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems

Keywords

  • clustering
  • k-nearest neighbour graphs
  • lower bound computation
  • metric space
  • minimum spanning trees
  • proximity problems

Fingerprint

Dive into the research topics of 'A Generalized Approach for Reducing Expensive Distance Calls for A Broad Class of Proximity Problems'. Together they form a unique fingerprint.

Cite this