TY - GEN
T1 - Nn-descent on high-dimensional data
AU - Bratić, Brankica
AU - Houle, Michael E.
AU - Kurbalija, Vladimir
AU - Oria, Vincent
AU - Radovanović, Miloš
N1 - Publisher Copyright:
© 2018 Association for Computing Machinery.
PY - 2018/6/25
Y1 - 2018/6/25
N2 - K-nearest neighbor graphs (K-NNGs) are used in many data-mining and machine-learning algorithms. Naive construction of K-NNGs has a complexity of O(n2), which could be a problem for large-scale data sets. In order to achieve higher efficiency, many exact and approximate algorithms have been developed, including the NN-Descent algorithm of Dong, Charikar and Li. Empirical evidence suggests that the practical complexity of this algorithm is in Õ(n1.14), which is a significant improvement over brute force construction. However, NN-Descent has a major drawback — it produces good results only on data of low intrinsic dimensionality. This paper presents an experimental analysis of this behavior, and investigates possible solutions. We link the quality of performance of NN-Descent with the phenomenon of hubness, defined as the tendency of intrinsically high-dimensional data to contain hubs —points with high in-degrees in the K-NNG. We propose two approaches to alleviate the observed negative influence of hubs on NN-Descent performance.
AB - K-nearest neighbor graphs (K-NNGs) are used in many data-mining and machine-learning algorithms. Naive construction of K-NNGs has a complexity of O(n2), which could be a problem for large-scale data sets. In order to achieve higher efficiency, many exact and approximate algorithms have been developed, including the NN-Descent algorithm of Dong, Charikar and Li. Empirical evidence suggests that the practical complexity of this algorithm is in Õ(n1.14), which is a significant improvement over brute force construction. However, NN-Descent has a major drawback — it produces good results only on data of low intrinsic dimensionality. This paper presents an experimental analysis of this behavior, and investigates possible solutions. We link the quality of performance of NN-Descent with the phenomenon of hubness, defined as the tendency of intrinsically high-dimensional data to contain hubs —points with high in-degrees in the K-NNG. We propose two approaches to alleviate the observed negative influence of hubs on NN-Descent performance.
KW - Hubness
KW - K-nearest neighbor graph
KW - NN-Descent
UR - http://www.scopus.com/inward/record.url?scp=85053484571&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85053484571&partnerID=8YFLogxK
U2 - 10.1145/3227609.3227643
DO - 10.1145/3227609.3227643
M3 - Conference contribution
AN - SCOPUS:85053484571
T3 - ACM International Conference Proceeding Series
BT - WIMS 2018 - 8th International Conference on Web Intelligence, Mining and Semantics
A2 - Badica, Costin
A2 - Akerkar, Rajendra
A2 - Ivanovic, Mirjana
A2 - Savic, Milos
A2 - Radovanovic, Milos
A2 - Kim, Sang-Wook
A2 - Rosati, Riccardo
A2 - Manolopoulos, Yannis
PB - Association for Computing Machinery
T2 - 8th International Conference on Web Intelligence, Mining and Semantics, WIMS 2018
Y2 - 25 June 2018 through 27 June 2018
ER -