Flexible aggregate similarity search in high-dimensional data sets

Michael E. Houle, Xiguo Ma, Vincent Oria

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Numerous applications in different fields, such as spatial databases, multimedia databases, data mining and recommender systems, may benefit from efficient and effective aggregate similarity search, also known as aggregate nearest neighbor (AggNN) search. Given a group of query objects Q, the goal of AggNN is to retrieve the k most similar objects from the database, where the underlying similarity measure is defined as an aggregation (usually sum, avg or max) of the distances between the retrieved objects and every query object in Q. Recently, the problem was generalized so as to retrieve the k objects which are most similar to a fixed proportion of the elements of Q. This variant of aggregate similarity search is referred to as ‘flexible AggNN’, or FANN. In this work, we propose two approximation algorithms, one for the sum and avg variants of FANN, and the other for the max variant. Extensive experiments are provided showing that, relative to state-of-the-art approaches (both exact and approximate), our algorithms produce query results with good accuracy, while at the same time being very efficient — even for real datasets of very high dimension.

Original languageEnglish (US)
Title of host publicationSimilarity Search and Applications - 8th International Conference, SISAP 2015, Proceedings
EditorsRichard Connor, Giuseppe Amato, Fabrizio Falchi, Claudio Gennaro
PublisherSpringer Verlag
Pages15-28
Number of pages14
ISBN (Print)9783319250861
DOIs
StatePublished - 2015
Event8th International Conference on Similarity Search and Applications, SISAP 2015 - Glasgow, United Kingdom
Duration: Oct 12 2015Oct 14 2015

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9371
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other8th International Conference on Similarity Search and Applications, SISAP 2015
CountryUnited Kingdom
CityGlasgow
Period10/12/1510/14/15

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Flexible aggregate similarity search in high-dimensional data sets'. Together they form a unique fingerprint.

Cite this