TY - JOUR
T1 - The Influence of Hubness on NN-Descent
AU - Bratić, Brankica
AU - Houle, Michael E.
AU - Kurbalija, Vladimir
AU - Oria, Vincent
AU - Radovanović, Miloš
N1 - Funding Information:
M. E. Houle gratefully acknowledges the financial support of JSPS Kakenhi Kiban (B) Research Grant 18H03296. B. Bratić, V. Kurbalija and M. Radovanović thank the Serbian Ministry of Education, Science and Technological Development for support through project no. OI174023, “Intelligent Techniques and their Integration into Wide-Spectrum Decision Support.” V. Oria acknowledges the financial support of NSF Research Grants DGE 1565478 and ICER 1639683.
Publisher Copyright:
© 2019 World Scientific Publishing Company.
PY - 2019/9/1
Y1 - 2019/9/1
N2 - The K-nearest neighbor graph (K-NNG) is a data structure used by many machine-learning algorithms. Naive computation of the K-NNG has quadratic time complexity, which in many cases is not efficient enough, producing the need for fast and accurate approximation algorithms. NN-Descent is one such algorithm that is highly efficient, but has a major drawback in that K-NNG approximations are accurate only on data of low intrinsic dimensionality. This paper represents an experimental analysis of this behavior, and investigates possible solutions. Experimental results show that there is a link between the performance of NN-Descent and the phenomenon of hubness, defined as the tendency of intrinsically high-dimensional data to contain hubs-points with large in-degrees in the K-NNG. First, we explain how the presence of the hubness phenomenon causes bad NN-Descent performance. In light of that, we propose four NN-Descent variants to alleviate the observed negative inuence of hubs. By evaluating the proposed approaches on several real and synthetic data sets, we conclude that our approaches are more accurate, but often at the cost of higher scan rates.
AB - The K-nearest neighbor graph (K-NNG) is a data structure used by many machine-learning algorithms. Naive computation of the K-NNG has quadratic time complexity, which in many cases is not efficient enough, producing the need for fast and accurate approximation algorithms. NN-Descent is one such algorithm that is highly efficient, but has a major drawback in that K-NNG approximations are accurate only on data of low intrinsic dimensionality. This paper represents an experimental analysis of this behavior, and investigates possible solutions. Experimental results show that there is a link between the performance of NN-Descent and the phenomenon of hubness, defined as the tendency of intrinsically high-dimensional data to contain hubs-points with large in-degrees in the K-NNG. First, we explain how the presence of the hubness phenomenon causes bad NN-Descent performance. In light of that, we propose four NN-Descent variants to alleviate the observed negative inuence of hubs. By evaluating the proposed approaches on several real and synthetic data sets, we conclude that our approaches are more accurate, but often at the cost of higher scan rates.
KW - NN-Descent
KW - hubness
KW - k-nearest neighbor graph
UR - http://www.scopus.com/inward/record.url?scp=85072953049&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85072953049&partnerID=8YFLogxK
U2 - 10.1142/S0218213019600029
DO - 10.1142/S0218213019600029
M3 - Article
AN - SCOPUS:85072953049
SN - 0218-2130
VL - 28
JO - International Journal on Artificial Intelligence Tools
JF - International Journal on Artificial Intelligence Tools
IS - 6
M1 - 1960002
ER -