Learning from incomplete labeled data via adversarial data generation

Wentao Wang, Tyler Derr, Yao Ma, Suhang Wang, Hui Liu, Zitao Liu, Jiliang Tang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations


Positive and unlabeled (PU) learning aims to obtain a well-performed classifier via an incomplete binary training set, in which only a part of labels of one category is known while the rest are unknown. However, in many real-world applications such as image recognition, the collected data samples often involve more than two categories. Moreover, only a small portion of the collected samples might have associated labels due to some practical reasons, and these labeled samples cannot always cover all the categories. We refer to this type of data as incomplete labeled data. In this paper, we first formally define the incomplete labeled data learning problem and then aim to tackle it via adversarial data generation. Specifically, we propose a novel generative framework LILA, which can produce synthetic labeled samples for both partially labeled categories and unlabeled categories. To enforce that the generated samples for unlabeled categories can associate with correct labels, we integrate two active learning processes into the LILA framework for selecting unlabeled samples in the collected sample set to query their labels effectively. After LILA has been well trained, a classifier can be trained on the balanced augmented data set consisting of both generated and original labeled samples. Extensive experiments on real image data demonstrate the effectiveness of our proposed framework. We release the implementation of the proposed framework via https://github.com/wentao-repo/LILA.

Original languageEnglish (US)
Title of host publicationProceedings - 20th IEEE International Conference on Data Mining, ICDM 2020
EditorsClaudia Plant, Haixun Wang, Alfredo Cuzzocrea, Carlo Zaniolo, Xindong Wu
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages6
ISBN (Electronic)9781728183169
StatePublished - Nov 2020
Externally publishedYes
Event20th IEEE International Conference on Data Mining, ICDM 2020 - Virtual, Sorrento, Italy
Duration: Nov 17 2020Nov 20 2020

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
ISSN (Print)1550-4786


Conference20th IEEE International Conference on Data Mining, ICDM 2020
CityVirtual, Sorrento

All Science Journal Classification (ASJC) codes

  • General Engineering


  • Active learning
  • Generative model
  • Incomplete labeled data


Dive into the research topics of 'Learning from incomplete labeled data via adversarial data generation'. Together they form a unique fingerprint.

Cite this