Abstract
The automatic detection of defects in industrial photovoltaic film is crucial for ensuring the reliability of photovoltaic modules. Key challenges include limited defect samples, interclass feature similarities, and interference from complex backgrounds. Existing deep learning-based methods require large-scale datasets and focus solely on visual data, which limits their effectiveness in few-shot defect detection (FSDD). To address these challenges, we propose Contextual Ensemble Language-Image multimodal Network (CELIN), which enhances FSDD in photovoltaic films by incorporating textual information through prompt tuning. Unlike traditional language-image models that rely on single fixed text prompts, CELIN employs a position-aware context ensemble strategy to integrate position-specific prompt vectors, enabling the model to capture global contextual information and reduce background interference. In addition, a cross class mask method is introduced to differentiate between similar defect categories by blocking interclass interactions during attention computation, thereby minimizing misclassification. Extensive experiments on our own few-shot photovoltaic film defect dataset and various public benchmarks demonstrates that CELIN significantly outperforms existing methods.
Original language | English (US) |
---|---|
Journal | IEEE Transactions on Industrial Informatics |
DOIs | |
State | Accepted/In press - 2025 |
All Science Journal Classification (ASJC) codes
- Control and Systems Engineering
- Information Systems
- Computer Science Applications
- Electrical and Electronic Engineering
Keywords
- Contextual ensemble prompt tuning
- few-shot surface defect detection
- photovoltaic film defect detection
- vision-language multimodal model