SeMi: When Imbalanced Semi-Supervised Learning Meets Mining Hard Examples

  • Yin Wang
  • , Zixuan Wang
  • , Hao Lu
  • , Zhen Qin
  • , Hailiang Zhao
  • , Guanjie Cheng
  • , Xin Du
  • , Ge Su
  • , Li Kuang
  • , Mengchu Zhou
  • , Shuiguang Deng

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Semi-Supervised Learning (SSL) can leverage abundant unlabeled data to boost model performance. However, the class-imbalanced data distribution in real-world scenarios poses great challenges to SSL, resulting in performance degradation. Existing class-imbalanced semi-supervised learning (CISSL) methods mainly focus on rebalancing datasets but ignore the potential of using hard examples to enhance performance, making it difficult to fully harness the power of unlabeled data even with sophisticated algorithms. To address this issue, we propose a method that enhances the performance of Imbalanced Semi-Supervised Learning by Mining Hard Examples (SeMi). This method distinguishes the entropy differences among logits of hard and easy examples, thereby identifying hard examples and increasing the utility of unlabeled data, better addressing the imbalance problem in CISSL. In addition, we maintain a class-balanced memory bank with confidence decay for storing high-confidence embeddings to enhance the pseudo-labels' reliability. Although our method is simple, it is effective and seamlessly integrates with existing approaches. We perform comprehensive experiments on standard CISSL benchmarks and experimentally demonstrate that our proposed SeMi outperforms existing state-of-the-art methods on multiple benchmarks, especially in reversed scenarios, where our best result shows approximately a 54.8% improvement over the baseline methods. Our code is available at https://github.com/pywin/SeMi.

Original languageEnglish (US)
Title of host publicationMM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025
PublisherAssociation for Computing Machinery, Inc
Pages499-507
Number of pages9
ISBN (Electronic)9798400720352
DOIs
StatePublished - Oct 27 2025
Externally publishedYes
Event33rd ACM International Conference on Multimedia, MM 2025 - Dublin, Ireland
Duration: Oct 27 2025Oct 31 2025

Publication series

NameMM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025

Conference

Conference33rd ACM International Conference on Multimedia, MM 2025
Country/TerritoryIreland
CityDublin
Period10/27/2510/31/25

All Science Journal Classification (ASJC) codes

  • Human-Computer Interaction
  • Software
  • Artificial Intelligence
  • Computer Graphics and Computer-Aided Design

Keywords

  • class-imbalanced data
  • multimodal support
  • pseudo-labeling
  • semi-supervised learning

Fingerprint

Dive into the research topics of 'SeMi: When Imbalanced Semi-Supervised Learning Meets Mining Hard Examples'. Together they form a unique fingerprint.

Cite this