CrossModal-CLIP: A novel multimodal contrastive learning framework for robust network traffic anomaly detection

  • Cheng Fang
  • , Yingkun Liu
  • , Shenglin Teng
  • , Mingrui Yin
  • , Tao Han

Research output: Contribution to journalArticlepeer-review

Abstract

The rapid proliferation of Internet-connected devices has amplified online activities but also escalated the complexity of network threats. Traditional methods relying on statistical and raw byte-based analysis often inadequately capture comprehensive behaviors of network traffic, leading to potential information loss. In this article, a novel method for network anomaly detection using cross-modal contrastive learning is proposed. By effectively fusing intermediate “multimodal” representation of traffic data-byte grayscale images and statistical sequences-via contrastive learning, our method enhances the robustness of traffic representation. Using a cross-modal Transformer encoder for fusion strengthens this representation, addressing the limitations of traditional methods. In contrastive learning, a dynamically increasing temperature coefficient is designed to adjust the pre-training model. Additionally, leveraging self-supervised contrastive learning reduces reliance on labeled samples while enhancing feature extraction capabilities. Extensive experiments on multiple real datasets validate the effectiveness of our method, demonstrating excellent performance with significant improvements in recall and precision compared to existing approaches. In addition, by the invariance mechanism of contrastive learning across scenarios, we have also applied the pre-trained model to the encrypted environment to explore the generalization performance.

Original languageEnglish (US)
Article number111723
JournalComputer Networks
Volume272
DOIs
StatePublished - Nov 2025

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications

Keywords

  • Contrastive learning
  • Multimodal
  • Network traffic
  • Pre-trained
  • Transformer

Fingerprint

Dive into the research topics of 'CrossModal-CLIP: A novel multimodal contrastive learning framework for robust network traffic anomaly detection'. Together they form a unique fingerprint.

Cite this