Abstract
Fine-grained image retrieval (FGIR) is a crucial task in computer vision, with broad applications in areas such as biodiversity monitoring, e-commerce, and medical diagnostics. However, capturing discriminative feature information to generate binary codes is difficult because of high intraclass variance and low interclass variance. To address this challenge, we (i) build a novel and highly reliable fine-grained deep hash learning framework for more accurate retrieval of fine-grained images. (ii) We propose a part significant region erasure method that forces the network to generate compact binary codes. (iii) We introduce a CNN-guided Transformer structure for use in finegrained retrieval tasks to capture fine-grained images effectively in contextual feature relationships to mine more discriminative regional features. (iv) A multistage mixture loss is designed to optimize network training and enhance feature representation. Experiments were conducted on three publicly available finegrained datasets. The results show that our method effectively improves the performance of fine-grained imsage retrieval.
Original language | English (US) |
---|---|
Journal | IEEE Transactions on Big Data |
DOIs | |
State | Accepted/In press - 2025 |
All Science Journal Classification (ASJC) codes
- Information Systems
- Information Systems and Management
Keywords
- attention mechanism
- convolutional neural network
- Fine-grained image retrieval
- hashing