Proportionate Diversification of Top-k LLM Results using Database Queries

Thinh On, Subhodeep Ghosh, Mengnan Du, Senjuti Basu Roy

Research output: Contribution to journalConference articlepeer-review


Result diversification aims to return relevant results that cover a variety of perspectives. Attribute-based diversification groups results by shared attributes (e.g., genre for movies) and selects a proportional number of items from each group based on their distribution in the underlying data. However, large language models (LLMs) are not designed to produce proportionally diverse results. In this work, we propose leveraging external data sources to determine the distribution of groups related to a query and prompt LLMs to produce proportionally diverse results. This can improve result diversity by representing groups in proportion to their prevalence. Specifically, we first argue the benefits of making top-k results from LLMs proportionally diverse. We then show how to use external benchmark databases to enable proportional diversity. Finally, we outline a framework that prompts LLMs with proportionality information from external data and discuss challenges in automating this process. Our approach provides a path to overcoming LLMs' limitations in producing proportionally diverse responses.

Original languageEnglish (US)
JournalCEUR Workshop Proceedings
StatePublished - 2023
EventJoint Workshops at the 49th International Conference on Very Large Data Bases, VLDBW 2023 - Vancouver, Canada
Duration: Aug 28 2023Sep 1 2023

All Science Journal Classification (ASJC) codes

  • General Computer Science


  • large language models (LLMs)
  • prompting LLMs
  • querying database
  • top-k Diversification

Cite this