A generic framework for efficient computation of top-k diverse results

Md Mouinul Islam, Mahsa Asadi, Sihem Amer-Yahia, Senjuti Basu Roy

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Result diversification is extensively studied in the context of search, recommendation, and data exploration. There are numerous algorithms that return top-k results that are both diverse and relevant. These algorithms typically have computational loops that compare the pairwise diversity of records to decide which ones to retain. We propose an access primitive DivGetBatch() that replaces repeated pairwise comparisons of diversity scores of records by pairwise comparisons of “aggregate” diversity scores of a group of records, thereby improving the running time of these algorithms while preserving the same results. We integrate the access primitive inside three representative diversity algorithms and prove that the augmented algorithms leveraging the access primitive preserve original results. We analyze the worst and expected case running times of these algorithms. We propose a computational framework to design this access primitive that has a pre-computed index structure I-tree that is agnostic to the specific details of diversity algorithms. We develop principled solutions to construct and maintain I-tree. Our experiments on multiple large real-world datasets corroborate our theoretical findings, while ensuring up to a 24 × speedup.

Original languageEnglish (US)
Pages (from-to)737-761
Number of pages25
JournalVLDB Journal
Volume32
Issue number4
DOIs
StatePublished - Jul 2023

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Hardware and Architecture

Keywords

  • Diversification
  • Query processing
  • Top-k algorithms

Fingerprint

Dive into the research topics of 'A generic framework for efficient computation of top-k diverse results'. Together they form a unique fingerprint.

Cite this