TY - GEN
T1 - Flexible aggregate similarity search in high-dimensional data sets
AU - Houle, Michael E.
AU - Ma, Xiguo
AU - Oria, Vincent
N1 - Publisher Copyright:
© Springer International Publishing Switzerland 2015.
PY - 2015
Y1 - 2015
N2 - Numerous applications in different fields, such as spatial databases, multimedia databases, data mining and recommender systems, may benefit from efficient and effective aggregate similarity search, also known as aggregate nearest neighbor (AggNN) search. Given a group of query objects Q, the goal of AggNN is to retrieve the k most similar objects from the database, where the underlying similarity measure is defined as an aggregation (usually sum, avg or max) of the distances between the retrieved objects and every query object in Q. Recently, the problem was generalized so as to retrieve the k objects which are most similar to a fixed proportion of the elements of Q. This variant of aggregate similarity search is referred to as ‘flexible AggNN’, or FANN. In this work, we propose two approximation algorithms, one for the sum and avg variants of FANN, and the other for the max variant. Extensive experiments are provided showing that, relative to state-of-the-art approaches (both exact and approximate), our algorithms produce query results with good accuracy, while at the same time being very efficient — even for real datasets of very high dimension.
AB - Numerous applications in different fields, such as spatial databases, multimedia databases, data mining and recommender systems, may benefit from efficient and effective aggregate similarity search, also known as aggregate nearest neighbor (AggNN) search. Given a group of query objects Q, the goal of AggNN is to retrieve the k most similar objects from the database, where the underlying similarity measure is defined as an aggregation (usually sum, avg or max) of the distances between the retrieved objects and every query object in Q. Recently, the problem was generalized so as to retrieve the k objects which are most similar to a fixed proportion of the elements of Q. This variant of aggregate similarity search is referred to as ‘flexible AggNN’, or FANN. In this work, we propose two approximation algorithms, one for the sum and avg variants of FANN, and the other for the max variant. Extensive experiments are provided showing that, relative to state-of-the-art approaches (both exact and approximate), our algorithms produce query results with good accuracy, while at the same time being very efficient — even for real datasets of very high dimension.
UR - http://www.scopus.com/inward/record.url?scp=84951853074&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84951853074&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-25087-8_2
DO - 10.1007/978-3-319-25087-8_2
M3 - Conference contribution
AN - SCOPUS:84951853074
SN - 9783319250861
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 15
EP - 28
BT - Similarity Search and Applications - 8th International Conference, SISAP 2015, Proceedings
A2 - Connor, Richard
A2 - Amato, Giuseppe
A2 - Falchi, Fabrizio
A2 - Gennaro, Claudio
PB - Springer Verlag
T2 - 8th International Conference on Similarity Search and Applications, SISAP 2015
Y2 - 12 October 2015 through 14 October 2015
ER -