TY - GEN
T1 - Learning from incomplete labeled data via adversarial data generation
AU - Wang, Wentao
AU - Derr, Tyler
AU - Ma, Yao
AU - Wang, Suhang
AU - Liu, Hui
AU - Liu, Zitao
AU - Tang, Jiliang
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/11
Y1 - 2020/11
N2 - Positive and unlabeled (PU) learning aims to obtain a well-performed classifier via an incomplete binary training set, in which only a part of labels of one category is known while the rest are unknown. However, in many real-world applications such as image recognition, the collected data samples often involve more than two categories. Moreover, only a small portion of the collected samples might have associated labels due to some practical reasons, and these labeled samples cannot always cover all the categories. We refer to this type of data as incomplete labeled data. In this paper, we first formally define the incomplete labeled data learning problem and then aim to tackle it via adversarial data generation. Specifically, we propose a novel generative framework LILA, which can produce synthetic labeled samples for both partially labeled categories and unlabeled categories. To enforce that the generated samples for unlabeled categories can associate with correct labels, we integrate two active learning processes into the LILA framework for selecting unlabeled samples in the collected sample set to query their labels effectively. After LILA has been well trained, a classifier can be trained on the balanced augmented data set consisting of both generated and original labeled samples. Extensive experiments on real image data demonstrate the effectiveness of our proposed framework. We release the implementation of the proposed framework via https://github.com/wentao-repo/LILA.
AB - Positive and unlabeled (PU) learning aims to obtain a well-performed classifier via an incomplete binary training set, in which only a part of labels of one category is known while the rest are unknown. However, in many real-world applications such as image recognition, the collected data samples often involve more than two categories. Moreover, only a small portion of the collected samples might have associated labels due to some practical reasons, and these labeled samples cannot always cover all the categories. We refer to this type of data as incomplete labeled data. In this paper, we first formally define the incomplete labeled data learning problem and then aim to tackle it via adversarial data generation. Specifically, we propose a novel generative framework LILA, which can produce synthetic labeled samples for both partially labeled categories and unlabeled categories. To enforce that the generated samples for unlabeled categories can associate with correct labels, we integrate two active learning processes into the LILA framework for selecting unlabeled samples in the collected sample set to query their labels effectively. After LILA has been well trained, a classifier can be trained on the balanced augmented data set consisting of both generated and original labeled samples. Extensive experiments on real image data demonstrate the effectiveness of our proposed framework. We release the implementation of the proposed framework via https://github.com/wentao-repo/LILA.
KW - Active learning
KW - Generative model
KW - Incomplete labeled data
UR - http://www.scopus.com/inward/record.url?scp=85100878496&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85100878496&partnerID=8YFLogxK
U2 - 10.1109/ICDM50108.2020.00170
DO - 10.1109/ICDM50108.2020.00170
M3 - Conference contribution
AN - SCOPUS:85100878496
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 1316
EP - 1321
BT - Proceedings - 20th IEEE International Conference on Data Mining, ICDM 2020
A2 - Plant, Claudia
A2 - Wang, Haixun
A2 - Cuzzocrea, Alfredo
A2 - Zaniolo, Carlo
A2 - Wu, Xindong
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 20th IEEE International Conference on Data Mining, ICDM 2020
Y2 - 17 November 2020 through 20 November 2020
ER -