Anomaly detection for data streams in large-scale distributed heterogeneous computing environments

Yue Dang, Bin Wang, Ryan Brant, Zhiping Zhang, Maha Alqallaf, Zhiqiang Wu

Research output: Contribution to journalArticlepeer-review

Abstract

Counteracting cyber threats to ensure secure cyberspace faces great challenges as cyber-attacks are increasingly stealthy and sophisticated; the protected cyber domains exhibit rapidly growing complexity and scale. It is important to design big data-driven cyber security solutions that effectively and efficiently derive actionable intelligence from available heterogeneous sources of information using principled data analytic methods to defend against cyber threats. In this work, we present a scalable distributed framework to collect and process extreme-scale networking and computing system traffic and status data from multiple sources that collectively represent the system under study, and develop and apply real-time adaptive data analytics for anomaly detection to monitor, understand, maintain, and improve cybersecurity. The data analytics will integrate multiple sophisticated machine learning algorithms and human-in-the-loop for iterative ensemble learning. Given the volume, speed, and complex nature of the data gathered, plus the need of real-time data analytics, a scalable data processing framework needs to handle big data with low latency. Our proposed big-data analytics will be implemented using an Apache Spark computing cluster. The analytics developed will offer significant improvements over existing methods of anomaly detection in real time. Our preliminary evaluation studies have shown that the developed techniques achieve better capabilities of defending against cyber threats.

Original languageAmerican English
JournalProceedings of the 12th International Conference on Cyber Warfare and Security, ICCWS 2017
StatePublished - Jan 1 2017

Keywords

  • Anomaly detection, Apache Spark, Data analytics, Distributed processing framework, Large-scale cyber system

Cite this