Abstract
The rapid proliferation of Internet-connected devices has amplified online activities but also escalated the complexity of network threats. Traditional methods relying on statistical and raw byte-based analysis often inadequately capture comprehensive behaviors of network traffic, leading to potential information loss. In this article, a novel method for network anomaly detection using cross-modal contrastive learning is proposed. By effectively fusing intermediate “multimodal” representation of traffic data-byte grayscale images and statistical sequences-via contrastive learning, our method enhances the robustness of traffic representation. Using a cross-modal Transformer encoder for fusion strengthens this representation, addressing the limitations of traditional methods. In contrastive learning, a dynamically increasing temperature coefficient is designed to adjust the pre-training model. Additionally, leveraging self-supervised contrastive learning reduces reliance on labeled samples while enhancing feature extraction capabilities. Extensive experiments on multiple real datasets validate the effectiveness of our method, demonstrating excellent performance with significant improvements in recall and precision compared to existing approaches. In addition, by the invariance mechanism of contrastive learning across scenarios, we have also applied the pre-trained model to the encrypted environment to explore the generalization performance.
| Original language | English (US) |
|---|---|
| Article number | 111723 |
| Journal | Computer Networks |
| Volume | 272 |
| DOIs | |
| State | Published - Nov 2025 |
All Science Journal Classification (ASJC) codes
- Computer Networks and Communications
Keywords
- Contrastive learning
- Multimodal
- Network traffic
- Pre-trained
- Transformer