UpGen: Unleashing Potential of Foundation Models for Training-Free Camouflage Detection via Generative Models

  • Ji Du
  • , Jiesheng Wu
  • , Desheng Kong
  • , Weiyun Liang
  • , Fangwei Hao
  • , Jing Xu
  • , Bin Wang
  • , Guiling Wang
  • , Ping Li

Research output: Contribution to journalArticlepeer-review

Abstract

Camouflaged Object Detection (COD) aims to segment objects resembling their environment. To address the challenges of extensive annotations and complex optimizations in supervised learning, recent prompt-based segmentation methods excavate insightful prompts from Large Vision-Language Models (LVLMs) and refine them using various foundation models. These are subsequently fed into the Segment Anything Model (SAM) for segmentation. However, due to the hallucinations of LVLMs and insufficient image-prompt interactions during the refinement stage, these prompts often struggle to capture well-established class differentiation and localization of camouflaged objects, resulting in performance degradation. To provide SAM with more informative prompts, we present UpGen, a pipeline that prompts SAM with generative prompts without requiring training, marking a novel integration of generative models with LVLMs. Specifically, we propose the Multi-Student-Single-Teacher (MSST) knowledge integration framework to alleviate hallucinations of LVLMs. This framework integrates insights from multiple sources to enhance the classification of camouflaged objects. To enhance interactions during the prompt refinement stage, we are the first to leverage generative models on real camouflage images to produce SAM-style prompts without fine-tuning. By capitalizing on the unique learning mechanism and structure of generative models, we effectively enable image-prompt interactions and generate highly informative prompts for SAM. Our extensive experiments demonstrate that UpGen outperforms weakly-supervised models and its SAM-based counterparts. We also integrate our framework into existing weakly-supervised methods to generate pseudo-labels, resulting in consistent performance gains. Moreover, with minor adjustments, UpGen shows promising results in open-vocabulary COD, referring COD, salient object detection, marine animal segmentation, and transparent object segmentation.

Original languageEnglish (US)
Pages (from-to)5400-5413
Number of pages14
JournalIEEE Transactions on Image Processing
Volume34
DOIs
StatePublished - 2025
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Software
  • Computer Graphics and Computer-Aided Design

Keywords

  • COD
  • SAM
  • generative models
  • large vision-language models
  • prompt-based segmentation

Fingerprint

Dive into the research topics of 'UpGen: Unleashing Potential of Foundation Models for Training-Free Camouflage Detection via Generative Models'. Together they form a unique fingerprint.

Cite this