Few-Shot Photovoltaic Film Defect Detection with Contextual Ensemble Language-Image Multimodal Network

Huiyan Wang, Ruihao Peng, Yiheng Zhu, Jiachen Li, Mengchu Zhou, Ming Ying, Shouguang Wang

Research output: Contribution to journalArticlepeer-review

Abstract

The automatic detection of defects in industrial photovoltaic film is crucial for ensuring the reliability of photovoltaic modules. Key challenges include limited defect samples, interclass feature similarities, and interference from complex backgrounds. Existing deep learning-based methods require large-scale datasets and focus solely on visual data, which limits their effectiveness in few-shot defect detection (FSDD). To address these challenges, we propose Contextual Ensemble Language-Image multimodal Network (CELIN), which enhances FSDD in photovoltaic films by incorporating textual information through prompt tuning. Unlike traditional language-image models that rely on single fixed text prompts, CELIN employs a position-aware context ensemble strategy to integrate position-specific prompt vectors, enabling the model to capture global contextual information and reduce background interference. In addition, a cross class mask method is introduced to differentiate between similar defect categories by blocking interclass interactions during attention computation, thereby minimizing misclassification. Extensive experiments on our own few-shot photovoltaic film defect dataset and various public benchmarks demonstrates that CELIN significantly outperforms existing methods.

Original languageEnglish (US)
JournalIEEE Transactions on Industrial Informatics
DOIs
StateAccepted/In press - 2025

All Science Journal Classification (ASJC) codes

  • Control and Systems Engineering
  • Information Systems
  • Computer Science Applications
  • Electrical and Electronic Engineering

Keywords

  • Contextual ensemble prompt tuning
  • few-shot surface defect detection
  • photovoltaic film defect detection
  • vision-language multimodal model

Fingerprint

Dive into the research topics of 'Few-Shot Photovoltaic Film Defect Detection with Contextual Ensemble Language-Image Multimodal Network'. Together they form a unique fingerprint.

Cite this