Effective and Efficient Algorithms for Flexible Aggregate Similarity Search in High Dimensional Spaces

Michael E. Houle, Xiguo Ma, Vincent Oria

Research output: Contribution to journalArticlepeer-review

7 Scopus citations

Abstract

Numerous applications in different fields, such as spatial databases, multimedia databases, data mining, and recommender systems, may benefit from efficient and effective aggregate similarity search, also known as aggregate nearest neighbor (AggNN) search. Given a group of query objects Q , the goal of AggNN is to retrieve the k most similar objects from the database, where the underlying similarity measure is defined as an aggregation (usually sum or max) of the distances between the retrieved objects and every query object in Q. Recently, the problem was generalized so as to retrieve the $k objects which are most similar to a fixed proportion of the elements of Q. This variant of aggregate similarity search is referred to as 'flexible AggNN', or FANN. In this work, we propose two approximation algorithms, one for the sum variant of FANN, and the other for the max variant. Extensive experiments are provided showing that, relative to state-of-the-art approaches (both exact and approximate), our algorithms produce query results with good accuracy, while at the same time being very efficient.

Original languageEnglish (US)
Article number7317853
Pages (from-to)3258-3273
Number of pages16
JournalIEEE Transactions on Knowledge and Data Engineering
Volume27
Issue number12
DOIs
StatePublished - Dec 2015

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Computer Science Applications
  • Computational Theory and Mathematics

Keywords

  • Similarity search
  • aggregation
  • dimensional test
  • intrinsic dimensionality

Fingerprint

Dive into the research topics of 'Effective and Efficient Algorithms for Flexible Aggregate Similarity Search in High Dimensional Spaces'. Together they form a unique fingerprint.

Cite this