Anomaly Detection in Catalog Streams

Chen Yang, Zhihui Du, Xiaofeng Meng, Xukang Zhang, Xinli Hao, David A. Bader

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Detecting anomalies with high accuracy and real time from large amounts of streaming data is a challenge for many real-world applications, such as smart city, astronomical observations, and remote sensing. This article focuses on a special kind of stream, catalog stream, whose high-level catalog structure can be used to analyze the stream effectively. We first formulate the anomaly detection in catalog streams as a constrained optimization problem based on a catalog stream matrix. Then, a novel filtering-identifying based anomaly detection algorithm (FIAD) is proposed, which includes two complementary strategies, true event identifying and false alarm filtering, data-oriented general method and domain-oriented specific method together, to detect truly valuable anomalies. Furthermore, different kinds of attention windows are developed to provide corresponding data for various algorithm components. A scalable and lightweight catalog stream processing framework CSPF is designed to support and implement the proposed method efficiently. A prototype system is developed to evaluate the proposed algorithm. Extensive experiments are conducted on the catalog stream data sets from an operational super large field-of-view high-cadence astronomy observation. The experimental results show that the proposed method can achieve a false-positive rate as low as 0.04%, reduces the false alarms by 98.6% compared with the existing methods, and the latency to handle each catalog is 2.1 seconds (much less than the required 15 seconds). Furthermore, a total of 36 transient candidates, including seven microlensing events, 27 superflares, and two dual-superflares, are detected from 21.67 million stars (involving 1.09 million catalogs) from one observation season.

Original languageEnglish (US)
Pages (from-to)294-311
Number of pages18
JournalIEEE Transactions on Big Data
Volume9
Issue number1
DOIs
StatePublished - Feb 1 2023

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Information Systems and Management

Keywords

  • Streaming data analysis
  • anomaly detection
  • big scientific data
  • distributed stream processing

Fingerprint

Dive into the research topics of 'Anomaly Detection in Catalog Streams'. Together they form a unique fingerprint.

Cite this