U.S. Department of Energy

Pacific Northwest National Laboratory


From DIC

Jump to: navigation, search

Machine Learning String Tools for Operational and Network Security

Cyber security tools typically rely on algorithms that use legacy knowledge from prior events. For example, software analysis primarily employs hashing schemes to create unique identifiers for binary streams. Rule-based systems are used to find patterns in network transactions to indicate malicious intent. Such reliance on relatively fixed signatures does not foster resilience and adaptation in addressing sophisticated threats of today.

At PNNL, the MLSTONES project has developed a mathematical formulation and computational infrastructure for developing new similarity metrics for cyber entities such as network transactions, source code, and instructions executing on a processor. Similarity metrics allow one to characterize the unknown (real-time use) and infer inheritance history (forensic or attributional use). MLSTONES applications are implemented using data intensive computing to drive analysis at a high throughput.

MLSTONES draws on two mature mathematical disciplines to enable string analysis that does not rely on exact or regular expression matching, or manually derived rules. These disciplines are:

  • Bioinformatics: the analysis common inheritance of biological molecular sequences, and
  • Support vector machine classification: an example of supervised learning.

Their combination allows one to associate two non-exact strings with each other when they share a common ancestor by ‘learning’ complex patterns of similarity from training data. Translated to cyber security, this allows us to quantify the nearness of cyber ‘entities’ leading to applications in rapid characterization and forensics.

The MLSTONES infrastructure is poised to significantly advance predictive methods advancing them from rule-based, reactive strategies to proactive ones. If successful, this can shift the burden of complexity from the defender to the attacker, eliminating many actors who don’t have access to significant resources of time, money, and expertise.

Article Title: MLStones

Article Added: 2010/08/22

Category(s): Analytic Algorithms, Cyber Security

Last Update: 13 July 2011 | Pacific Northwest National Laboratory