TY - JOUR
T1 - A Generalized Approach for Reducing Expensive Distance Calls for A Broad Class of Proximity Problems
AU - Augustine, Jees
AU - Shetiya, Suraj
AU - Esfandiari, Mohammadreza
AU - Basu Roy, Senjuti
AU - Das, Gautam
N1 - Funding Information:
The research of Gautam Das was supported in part by grants #2008602, #1745925, and #1937143 from the National Science Foundation. The work of Mohammadreza Esfandiari and Senjuti Basu Roy are supported in parts by the National Science Foundation grants #1942913, #2007935, #1814595, and by the Office of Naval Research Grant No:N000141812838.
Publisher Copyright:
© 2021 ACM.
PY - 2021
Y1 - 2021
N2 - In this paper, we revisit a suite of popular proximity problems (such as, KNN, clustering, minimum spanning tree) that repeatedly perform distance computations to compare distances during their execution. Our effort here is to design principled solutions to minimize distance computations for such problems in general metric spaces, especially for the scenarios where calling an expensive oracle to resolve unknown distances are the dominant cost of the algorithms for these problems. We present a suite of techniques, including a novel formulation of the problem, that studies how distance comparisons between objects could be modelled as a system of linear inequalities that assists in saving distance computations, multiple graph based solutions, as well as a practitioners guide to adopt our solution frameworks to proximity problems. We compare our designed solutions conceptually and empirically with respect to a broad range of existing works. We finally present a comprehensive set of experimental results using multiple large scale real-world datasets and a suite of popular proximity algorithms to demonstrate the effectiveness of our proposed approaches.
AB - In this paper, we revisit a suite of popular proximity problems (such as, KNN, clustering, minimum spanning tree) that repeatedly perform distance computations to compare distances during their execution. Our effort here is to design principled solutions to minimize distance computations for such problems in general metric spaces, especially for the scenarios where calling an expensive oracle to resolve unknown distances are the dominant cost of the algorithms for these problems. We present a suite of techniques, including a novel formulation of the problem, that studies how distance comparisons between objects could be modelled as a system of linear inequalities that assists in saving distance computations, multiple graph based solutions, as well as a practitioners guide to adopt our solution frameworks to proximity problems. We compare our designed solutions conceptually and empirically with respect to a broad range of existing works. We finally present a comprehensive set of experimental results using multiple large scale real-world datasets and a suite of popular proximity algorithms to demonstrate the effectiveness of our proposed approaches.
KW - clustering
KW - k-nearest neighbour graphs
KW - lower bound computation
KW - metric space
KW - minimum spanning trees
KW - proximity problems
UR - http://www.scopus.com/inward/record.url?scp=85108947277&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85108947277&partnerID=8YFLogxK
U2 - 10.1145/3448016.3457303
DO - 10.1145/3448016.3457303
M3 - Conference article
AN - SCOPUS:85108947277
SN - 0730-8078
SP - 142
EP - 154
JO - Proceedings of the ACM SIGMOD International Conference on Management of Data
JF - Proceedings of the ACM SIGMOD International Conference on Management of Data
T2 - 2021 International Conference on Management of Data, SIGMOD 2021
Y2 - 20 June 2021 through 25 June 2021
ER -