Abstract
Recent advancements in deep neural networks heavily rely on large-scale labeled datasets. However, acquiring annotations for large datasets can be challenging due to annotation constraints. Active learning offers a promising solution to this problem by selectively labeling a small, strategically chosen subset of the unlabeled dataset. However, current active learning methods struggle with data that are unevenly distributed, which leads to the selection of subsets that fail to represent the entire dataset. To overcome this challenge, we introduce a novel active learning algorithm that integrates SPace-filling (SP) designs with the Optimal Transport (OT) technique (SPOT). SPOT technique utilizes optimal transport to effectively manage data from complex manifolds by mapping them to a uniformly distributed hypercube. Additionally, the spacefilling design ensures a better asymptotic convergence rate, ensuring that the selected subset encompasses the entire dataset more effectively than other sampling methods, such as random sampling. Our extensive experiments across various image datasets and models demonstrate the superiority of SPOT over existing baselines.
| Original language | English (US) |
|---|---|
| Pages (from-to) | 1060-1074 |
| Number of pages | 15 |
| Journal | Big Data Mining and Analytics |
| Volume | 8 |
| Issue number | 5 |
| DOIs | |
| State | Published - 2025 |
| Externally published | Yes |
All Science Journal Classification (ASJC) codes
- Information Systems
- Computer Science Applications
- Computer Networks and Communications
- Artificial Intelligence
Keywords
- Active Learning (AL)
- Optimal Transport (OT)
- SPace-filling (SP)
- deep learning
- sampling