Abstract
Fine-grained image retrieval (FGIR) is a crucial task in computer vision, with broad applications in areas such as biodiversity monitoring, e-commerce, and medical diagnostics. However, capturing discriminative feature information to generate binary codes is difficult because of high intraclass variance and low interclass variance. To address this challenge, we (i) build a novel and highly reliable fine-grained deep hash learning framework for more accurate retrieval of fine-grained images. (ii) We propose a part significant region erasure method that forces the network to generate compact binary codes. (iii) We introduce a CNN-guided Transformer structure for use in fine-grained retrieval tasks to capture fine-grained images effectively in contextual feature relationships to mine more discriminative regional features. (iv) A multistage mixture loss is designed to optimize network training and enhance feature representation. Experiments were conducted on three publicly available fine-grained datasets. The results show that our method effectively improves the performance of fine-grained image retrieval.
| Original language | English (US) |
|---|---|
| Pages (from-to) | 2388-2400 |
| Number of pages | 13 |
| Journal | IEEE Transactions on Big Data |
| Volume | 11 |
| Issue number | 5 |
| DOIs | |
| State | Published - 2025 |
| Externally published | Yes |
All Science Journal Classification (ASJC) codes
- Information Systems
- Information Systems and Management
Keywords
- Fine-grained image retrieval
- attention mechanism
- convolutional neural network
- hashing