Search by multiple examples

Mingzhu Zhu, Yi Fang Brook Wu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Scopus citations

Abstract

It is often difficult for users to adopt keywords to express their information needs. Search-By-Multiple-Examples (SBME), a promising method for overcoming this problem, allows users to specify their information needs as a set of relevant documents rather than as a set of keywords. Most of the studies on SBME adopt the Positive Unlabeled learning (PU learning) techniques by treating the users' provided examples (denote as query examples) as positive set and the entire data collection as unlabeled set. However, it is inefficient to treat the entire data collection as unlabeled set, as its size can be huge. In addition, the query examples are treated as being relevant to a single topic, but it is often the case that they can be relevant to multiple topics. As the query examples are much fewer than the unlabeled data, the system performance may downgrade dramatically because of the class imbalance problem. What's more, the experiments conducted in these studies have not taken into account the settings in online search, which are very different from the controlled experiments scenario. This proposed research seeks to explore how to improve SBME by exploring: (1) how to predict user' information needs by modeling the content of the documents using probabilistic topic models; (2) how to deal with the class imbalance problem by reducing the size of the unlabeled data and adopting machine learning techniques. We will also conduct extensive experiments to better evaluate SBME using different sizes of query examples to simulate users' information needs.

Original languageEnglish (US)
Title of host publicationWSDM 2014 - Proceedings of the 7th ACM International Conference on Web Search and Data Mining
PublisherAssociation for Computing Machinery
Pages667-671
Number of pages5
ISBN (Print)9781450323512
DOIs
StatePublished - 2014
Event7th ACM International Conference on Web Search and Data Mining, WSDM 2014 - New York, NY, United States
Duration: Feb 24 2014Feb 28 2014

Publication series

NameWSDM 2014 - Proceedings of the 7th ACM International Conference on Web Search and Data Mining

Other

Other7th ACM International Conference on Web Search and Data Mining, WSDM 2014
Country/TerritoryUnited States
CityNew York, NY
Period2/24/142/28/14

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Information Systems

Keywords

  • information retrieval
  • positive unlabeled learning
  • search by multiple examples
  • transductive inference

Fingerprint

Dive into the research topics of 'Search by multiple examples'. Together they form a unique fingerprint.

Cite this