Drifted Twitter Spam Classification Using Multiscale Detection Test on K-L Divergence

Xuesong Wang, Qi Kang, Jing An, Mengchu Zhou

Research output: Contribution to journalArticlepeer-review

36 Scopus citations

Abstract

Twitter spam classification is a tough challenge for social media platforms and cyber security companies. Twitter spam with illegal links may evolve over time in order to deceive filtering models, causing disastrous loss to both users and the whole network. We define this distributional evolution as a concept drift scenario. To build an effective model, we adopt K-L divergence to represent spam distribution and use a multiscale drift detection test (MDDT) to localize possible drifts therein. A base classifier is then retrained based on the detection result to gain performance improvement. Comprehensive experiments show that K-L divergence has highly consistent change patterns between features when a drift occurs. Also, the MDDT is proved to be effective in improving final classification result in both accuracy, recall, and f-measure.

Original languageEnglish (US)
Article number8781937
Pages (from-to)108384-108394
Number of pages11
JournalIEEE Access
Volume7
DOIs
StatePublished - 2019

All Science Journal Classification (ASJC) codes

  • General Computer Science
  • General Materials Science
  • General Engineering

Keywords

  • Concept drift
  • K-L divergence
  • drift detection test
  • twitter spam classification

Fingerprint

Dive into the research topics of 'Drifted Twitter Spam Classification Using Multiscale Detection Test on K-L Divergence'. Together they form a unique fingerprint.

Cite this