Experimental challenges in cyber security: A story of provenance and lineage for malware

Tudor Dumitraş, Iulian Neamtiu

Research output: Contribution to conferencePaperpeer-review

21 Scopus citations


Rigorous experiments and empirical studies hold the promise of empowering researchers and practitioners to develop better approaches for cyber security. For example, understanding the provenance and lineage of polymorphic malware strains can lead to new techniques for detecting and classifying unknown attacks. Unfortunately, many challenges stand in the way: the lack of sufficient field data (e.g., malware samples and contextual information about their impact in the real world), the lack of metadata about the collection process of the existing data sets, the lack of ground truth, the difficulty of developing tools and methods for rigorous data analysis. As a first step towards rigorous experimental methods, we introduce two techniques for reconstructing the phylogenetic trees and dynamic control-flow graphs of unknown binaries, inspired from research in software evolution, bioinformatics and time series analysis. Our approach is based on the observation that the long evolution histories of open source projects provide an opportunity for creating precise models of lineage and provenance, which can be used for detecting and clustering malware as well. As a second step, we present experimental methods that combine the use of a representative corpus of malware and contextual information (gathered from end hosts rather than from network traces or honeypots) with sound data collection and analysis techniques. While our experimental methods serve a concrete purpose—understanding lineage and provenance—they also provide a general blueprint for addressing the threats to the validity of cyber security studies.

Original languageEnglish (US)
StatePublished - 2011
Externally publishedYes
Event4th Workshop on Cyber Security Experimentation and Test, CSET 2011 - San Francisco, United States
Duration: Aug 8 2011 → …


Conference4th Workshop on Cyber Security Experimentation and Test, CSET 2011
Country/TerritoryUnited States
CitySan Francisco
Period8/8/11 → …

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Safety, Risk, Reliability and Quality


Dive into the research topics of 'Experimental challenges in cyber security: A story of provenance and lineage for malware'. Together they form a unique fingerprint.

Cite this