SRGTNet: Subregion-guided Transformer Hash Network for Fine-Grained Image Retrieval

Hongchun Lu, Songlin He, Xue Li, Min Han, Chase Wu

Research output: Contribution to journalArticlepeer-review

Abstract

Fine-grained image retrieval (FGIR) is a crucial task in computer vision, with broad applications in areas such as biodiversity monitoring, e-commerce, and medical diagnostics. However, capturing discriminative feature information to generate binary codes is difficult because of high intraclass variance and low interclass variance. To address this challenge, we (i) build a novel and highly reliable fine-grained deep hash learning framework for more accurate retrieval of fine-grained images. (ii) We propose a part significant region erasure method that forces the network to generate compact binary codes. (iii) We introduce a CNN-guided Transformer structure for use in finegrained retrieval tasks to capture fine-grained images effectively in contextual feature relationships to mine more discriminative regional features. (iv) A multistage mixture loss is designed to optimize network training and enhance feature representation. Experiments were conducted on three publicly available finegrained datasets. The results show that our method effectively improves the performance of fine-grained imsage retrieval.

Original languageEnglish (US)
JournalIEEE Transactions on Big Data
DOIs
StateAccepted/In press - 2025

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Information Systems and Management

Keywords

  • attention mechanism
  • convolutional neural network
  • Fine-grained image retrieval
  • hashing

Fingerprint

Dive into the research topics of 'SRGTNet: Subregion-guided Transformer Hash Network for Fine-Grained Image Retrieval'. Together they form a unique fingerprint.

Cite this