Nn-descent on high-dimensional data

Brankica Bratić, Michael E. Houle, Vladimir Kurbalija, Vincent Oria, Miloš Radovanović

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

K-nearest neighbor graphs (K-NNGs) are used in many data-mining and machine-learning algorithms. Naive construction of K-NNGs has a complexity of O(n2), which could be a problem for large-scale data sets. In order to achieve higher efficiency, many exact and approximate algorithms have been developed, including the NN-Descent algorithm of Dong, Charikar and Li. Empirical evidence suggests that the practical complexity of this algorithm is in Õ(n1.14), which is a significant improvement over brute force construction. However, NN-Descent has a major drawback — it produces good results only on data of low intrinsic dimensionality. This paper presents an experimental analysis of this behavior, and investigates possible solutions. We link the quality of performance of NN-Descent with the phenomenon of hubness, defined as the tendency of intrinsically high-dimensional data to contain hubs —points with high in-degrees in the K-NNG. We propose two approaches to alleviate the observed negative influence of hubs on NN-Descent performance.

Original languageEnglish (US)
Title of host publicationWIMS 2018 - 8th International Conference on Web Intelligence, Mining and Semantics
EditorsCostin Badica, Rajendra Akerkar, Mirjana Ivanovic, Milos Savic, Milos Radovanovic, Sang-Wook Kim, Riccardo Rosati, Yannis Manolopoulos
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450354899
DOIs
StatePublished - Jun 25 2018
Event8th International Conference on Web Intelligence, Mining and Semantics, WIMS 2018 - Novi Sad, Serbia
Duration: Jun 25 2018Jun 27 2018

Publication series

NameACM International Conference Proceeding Series

Other

Other8th International Conference on Web Intelligence, Mining and Semantics, WIMS 2018
CountrySerbia
CityNovi Sad
Period6/25/186/27/18

All Science Journal Classification (ASJC) codes

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Keywords

  • Hubness
  • K-nearest neighbor graph
  • NN-Descent

Fingerprint Dive into the research topics of 'Nn-descent on high-dimensional data'. Together they form a unique fingerprint.

Cite this