Dimensional testing for multi-step similarity search

Michael E. Houle, Xiguo Ma, Michael Nett, Vincent Oria

Research output: Chapter in Book/Report/Conference proceedingConference contribution

32 Scopus citations

Abstract

In data mining applications such as subspace clustering or feature selection, changes to the underlying feature set can require the reconstruction of search indices to support fundamental data mining tasks. For such situations, multi-step search approaches have been proposed that can accommodate changes in the underlying similarity measure without the need to rebuild the index. In this paper, we present a heuristic multi-step search algorithm that utilizes a measure of intrinsic dimension, the generalized expansion dimension (GED), as the basis of its search termination condition. Compared to the current state-of-the-art method, experimental results show that our heuristic approach is able to obtain significant improvements in both the number of candidates and the running time, while losing very little in the accuracy of the query results.

Original languageEnglish (US)
Title of host publicationProceedings - 12th IEEE International Conference on Data Mining, ICDM 2012
Pages299-308
Number of pages10
DOIs
StatePublished - Dec 1 2012
Event12th IEEE International Conference on Data Mining, ICDM 2012 - Brussels, Belgium
Duration: Dec 10 2012Dec 13 2012

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
ISSN (Print)1550-4786

Other

Other12th IEEE International Conference on Data Mining, ICDM 2012
CountryBelgium
CityBrussels
Period12/10/1212/13/12

All Science Journal Classification (ASJC) codes

  • Engineering(all)

Keywords

  • Adaptive similarity
  • Intrinsic dimensionality
  • Multi-step
  • Nearest neighbor
  • Similarity search
  • κ-NN

Fingerprint Dive into the research topics of 'Dimensional testing for multi-step similarity search'. Together they form a unique fingerprint.

Cite this