SRGTNet: Subregion-Guided Transformer Hash Network for Fine-Grained Image Retrieval

  • Hongchun Lu
  • , Songlin He
  • , Xue Li
  • , Min Han
  • , Chase Wu

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

Fine-grained image retrieval (FGIR) is a crucial task in computer vision, with broad applications in areas such as biodiversity monitoring, e-commerce, and medical diagnostics. However, capturing discriminative feature information to generate binary codes is difficult because of high intraclass variance and low interclass variance. To address this challenge, we (i) build a novel and highly reliable fine-grained deep hash learning framework for more accurate retrieval of fine-grained images. (ii) We propose a part significant region erasure method that forces the network to generate compact binary codes. (iii) We introduce a CNN-guided Transformer structure for use in fine-grained retrieval tasks to capture fine-grained images effectively in contextual feature relationships to mine more discriminative regional features. (iv) A multistage mixture loss is designed to optimize network training and enhance feature representation. Experiments were conducted on three publicly available fine-grained datasets. The results show that our method effectively improves the performance of fine-grained image retrieval.

Original languageEnglish (US)
Pages (from-to)2388-2400
Number of pages13
JournalIEEE Transactions on Big Data
Volume11
Issue number5
DOIs
StatePublished - 2025
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Information Systems and Management

Keywords

  • Fine-grained image retrieval
  • attention mechanism
  • convolutional neural network
  • hashing

Fingerprint

Dive into the research topics of 'SRGTNet: Subregion-Guided Transformer Hash Network for Fine-Grained Image Retrieval'. Together they form a unique fingerprint.

Cite this