IFME: Information filtering by multiple examples with under-sampling in a digital library environment

Mingzhu Zhu, Chao Xu, Yi Fang Brook Wu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

13 Scopus citations

Abstract

With the amount of digitalized documents increasing exponentially, it is more difficult for users to keep up to date with the knowledge in their domain. In this paper, we present a framework named IFME (Information Filtering by Multiple Examples) in a digital library environment to help users identify the literature related to their interests by leveraging the Positive Unlabeled learning (PU learning). Using a few relevant documents provided by a user and considering the documents in an online database as unlabeled data (called U), it ranks the documents in U using a PU learning algorithm. From the experimental results, we found that while the approach performed well when a large set of relevant feedback documents were available, it performed relatively poor when the relevant feedback documents were few. We improved IFME by combining PU learning with under-sampling to tune the performance. Using Mean Average Precision (MAP), our experimental results indicated that with under-sampling, the performance improved significantly even when the size of P was small. We believe the PU learning based IFME framework brings insights to develop more effective digital library systems.

Original languageEnglish (US)
Title of host publicationJCDL 2013 - Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries
Pages107-110
Number of pages4
DOIs
StatePublished - 2013
Event13th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2013 - Indianapolis, IN, United States
Duration: Jul 22 2013Jul 26 2013

Publication series

NameProceedings of the ACM/IEEE Joint Conference on Digital Libraries
ISSN (Print)1552-5996

Other

Other13th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2013
Country/TerritoryUnited States
CityIndianapolis, IN
Period7/22/137/26/13

All Science Journal Classification (ASJC) codes

  • General Engineering

Keywords

  • Information retrieval
  • Positive unlabeled learning
  • Relevance feedback
  • Search by multiple examples
  • Text classification

Fingerprint

Dive into the research topics of 'IFME: Information filtering by multiple examples with under-sampling in a digital library environment'. Together they form a unique fingerprint.

Cite this