Abstract
Rigorous experiments and empirical studies hold the promise of empowering researchers and practitioners to develop better approaches for cyber security. For example, understanding the provenance and lineage of polymorphic malware strains can lead to new techniques for detecting and classifying unknown attacks. Unfortunately, many challenges stand in the way: the lack of sufficient field data (e.g., malware samples and contextual information about their impact in the real world), the lack of metadata about the collection process of the existing data sets, the lack of ground truth, the difficulty of developing tools and methods for rigorous data analysis. As a first step towards rigorous experimental methods, we introduce two techniques for reconstructing the phylogenetic trees and dynamic control-flow graphs of unknown binaries, inspired from research in software evolution, bioinformatics and time series analysis. Our approach is based on the observation that the long evolution histories of open source projects provide an opportunity for creating precise models of lineage and provenance, which can be used for detecting and clustering malware as well. As a second step, we present experimental methods that combine the use of a representative corpus of malware and contextual information (gathered from end hosts rather than from network traces or honeypots) with sound data collection and analysis techniques. While our experimental methods serve a concrete purpose—understanding lineage and provenance—they also provide a general blueprint for addressing the threats to the validity of cyber security studies.
Original language | English (US) |
---|---|
State | Published - 2011 |
Externally published | Yes |
Event | 4th Workshop on Cyber Security Experimentation and Test, CSET 2011 - San Francisco, United States Duration: Aug 8 2011 → … |
Conference
Conference | 4th Workshop on Cyber Security Experimentation and Test, CSET 2011 |
---|---|
Country/Territory | United States |
City | San Francisco |
Period | 8/8/11 → … |
All Science Journal Classification (ASJC) codes
- Computer Networks and Communications
- Safety, Risk, Reliability and Quality