TY - GEN
T1 - Active caching for similarity queries based on shared-neighbor information
AU - Houle, Michael E.
AU - Oria, Vincent
AU - Qasim, Umar
N1 - Copyright:
Copyright 2011 Elsevier B.V., All rights reserved.
PY - 2010
Y1 - 2010
N2 - Novel applications such as recommender systems, uncertain databases, and multimedia databases are designed to process similarity queries that produce ranked lists of objects as their results. Similarity queries typically result in disk access latency and incur a substantial computational cost. In this paper, we propose an 'active caching' technique for similarity queries that is capable of synthesizing query results from cached information even when the required result list is not explicitly stored in the cache. Our solution, the Cache Estimated Significance (CES) model, is based on shared-neighbor similarity measures, which assess the strength of the relationship between two objects as a function of the number of other objects in the common intersection of their neighborhoods. The proposed method is general in that it does not require that the features be drawn from a metric space, nor does it require that the partial orders induced by the similarity measure be monotonic. Experimental results on real data sets show a substantial cache hit rate when compared with traditional caching approaches.
AB - Novel applications such as recommender systems, uncertain databases, and multimedia databases are designed to process similarity queries that produce ranked lists of objects as their results. Similarity queries typically result in disk access latency and incur a substantial computational cost. In this paper, we propose an 'active caching' technique for similarity queries that is capable of synthesizing query results from cached information even when the required result list is not explicitly stored in the cache. Our solution, the Cache Estimated Significance (CES) model, is based on shared-neighbor similarity measures, which assess the strength of the relationship between two objects as a function of the number of other objects in the common intersection of their neighborhoods. The proposed method is general in that it does not require that the features be drawn from a metric space, nor does it require that the partial orders induced by the similarity measure be monotonic. Experimental results on real data sets show a substantial cache hit rate when compared with traditional caching approaches.
KW - Performance
UR - http://www.scopus.com/inward/record.url?scp=78651294019&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=78651294019&partnerID=8YFLogxK
U2 - 10.1145/1871437.1871524
DO - 10.1145/1871437.1871524
M3 - Conference contribution
AN - SCOPUS:78651294019
SN - 9781450300995
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 669
EP - 678
BT - CIKM'10 - Proceedings of the 19th International Conference on Information and Knowledge Management and Co-located Workshops
T2 - 19th International Conference on Information and Knowledge Management and Co-located Workshops, CIKM'10
Y2 - 26 October 2010 through 30 October 2010
ER -