The Influence of Hubness on NN-Descent

Brankica Bratić, Michael E. Houle, Vladimir Kurbalija, Vincent Oria, Miloš Radovanović

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

The K-nearest neighbor graph (K-NNG) is a data structure used by many machine-learning algorithms. Naive computation of the K-NNG has quadratic time complexity, which in many cases is not efficient enough, producing the need for fast and accurate approximation algorithms. NN-Descent is one such algorithm that is highly efficient, but has a major drawback in that K-NNG approximations are accurate only on data of low intrinsic dimensionality. This paper represents an experimental analysis of this behavior, and investigates possible solutions. Experimental results show that there is a link between the performance of NN-Descent and the phenomenon of hubness, defined as the tendency of intrinsically high-dimensional data to contain hubs-points with large in-degrees in the K-NNG. First, we explain how the presence of the hubness phenomenon causes bad NN-Descent performance. In light of that, we propose four NN-Descent variants to alleviate the observed negative inuence of hubs. By evaluating the proposed approaches on several real and synthetic data sets, we conclude that our approaches are more accurate, but often at the cost of higher scan rates.

Original languageEnglish (US)
Article number1960002
JournalInternational Journal on Artificial Intelligence Tools
Volume28
Issue number6
DOIs
StatePublished - Sep 1 2019

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence

Keywords

  • NN-Descent
  • hubness
  • k-nearest neighbor graph

Fingerprint

Dive into the research topics of 'The Influence of Hubness on NN-Descent'. Together they form a unique fingerprint.

Cite this