When the signature-based model was originally introduced for Internet security as a faster decision-making model for traffic filtering, signature databases were manageable and zero-day attacks could be imagined but were generally not executed.
However, problems began to surface when attempting to scale the signature model for the increasing volume of new signatures. The recent failures of signature-based technologies were highlighted by the famous malware trio of
These developments have led many in the security community to call for the adoption of the anomaly-based monitoring model. In this tip, we'll define anomaly-based malware monitoring and explore the benefits for enterprise malware defense.
Defining anomaly-based monitoring
The anomaly-based monitoring model is not a new concept; in fact, this model was the old security model, known previously as "deny-all" or "permit by exception." This model was popular in the early days of Internet security, before the Web became a front end for services, when such services were fewer and easier to manage.
An understanding of the anomaly-based monitoring model requires a familiarity with the definition of anomaly: a deviation from the norm; strange condition, situation or quality; an incongruity or inconsistency. Within the context of the anomaly-based monitoring model, it is important to understand that every anomaly does not represent a malicious event; anomaly detection technologies have historically had issues with false positives.
From the editors: More on malware detection
Review the basics of malware detection, plus new detection and prevention tools.
Learn how to proactively prevent malware attacks.
An anomaly is simply an event that is out of the norm, which ultimately means that the processing of an anomaly by a person determines whether an event is malicious. Later in this tip we will discuss some of the available tools and methods that can assist decision makers in the anomaly-based monitoring model.
The process of anomaly collection
In order to comprehend how anomaly detection technologies function, an initial discussion of the collection technologies is helpful. Collection technologies can be separated into two basic types: the heuristic method and the policy profile method.
The heuristic method relies on learning the environment via collecting statistics on network traffic. The collection engine analyzes network traffic and collects statistics on IP addresses, services and traffic volume. Statistical analysis is then performed to determine high and low values, averages and other relevant traffic data for the environment. A profile is built based on the various values, with traffic being compared to the normal baseline that is established. Traffic that matches the profile is considered normal, while all other traffic is considered anomalous. The heuristic method offers a faster setup but is more likely to interpret a malicious event as normal behavior. This occurs when malicious activity is present during the "learning" phase.
The other collection technology is the policy profile method, sometimes referred to as knowledge profile. This method borrows from the operational profile definition used in software reliability. In this case, the profile can be determined as the executable processes and their associated probabilities. If the heuristic method relies on machines learning the environment, the policy profile method relies on an operator having exacting knowledge of the environment and being able to build the profile into the machine with the known data. The operator knows the policy, services, assets and how services access assets on the network. The knowledge is quantified and entered into the profile. The policy profile model works best in small, constrained environments, while the biggest drawback associated with this model is the labor-intensive upfront effort required to define the profile. Large enterprise environments may consider this model too costly, especially considering it might also yield too many false positives to be useful.
Each method features strengths and weaknesses. In both cases, anomaly processing is a labor-intensive effort that requires a decision maker. Even when anomaly-detection technologies offer methods to process anomalies, they are, in reality, offering decision support capabilities that still require a knowledgeable network management analyst. In security circles, the analyst has been historically viewed as the weak link, but part of the evolution of the anomaly-based monitoring method is the now extremely important role of the analyst.
The anomaly-based monitoring model can potentially capture more malicious activity, such as zero-day attacks, insider threats and advanced persistent threats, than current signature-based technologies.
Anomaly processing can rely on any one of several models or a combination of models. In most cases, these models can work in either collection environment, but the models typically lend themselves more to one environment. Possible models include decision trees, fuzzy logic, neural networks, Markov and clustering.
Decision trees are similar to attack trees and fault trees. Their construction is based on events that can be viewed as failures; reverse analysis is performed in order to determine the causes or active faults that lead to the failure. As a single event is not usually the cause of a failure or intrusion, this analysis works well with model combinations. However, in order to build a representative tree, the ability to anticipate all sorts of problems becomes important; otherwise, zero-day attacks will likely threaten this model. Because of its discrete nature, this method tends to operate best with policy/profile-based data.
Fuzzy logic operates in the gray area between anomaly detection and malicious behavior. This processing method, based on the fuzzy set theory, is characterized by reasoning that is approximate in nature. Fuzzy logic typically provides information on the average traffic patterns, and the range of standard deviation values from the average. With this information, the user determines the point at which behavior is considered anomalous. The lack of hard boundaries subjects fuzzy logic technology to criticism from statisticians and mathematicians, who typically prefer probabilistic models. This approach can be used with either policy/profile- or heuristic-based collection techniques.
Neural networks, as the name implies, can be thought of as mathematical representations of the human nervous system and are commonly used for predictive analysis. A neural network can create a profile by understanding the mappings of input to output. By examining past patterns, future patterns are predicted. This profile helps create a trend, which leads to using probabilistic models in order to assist in visualizing and determining the likelihood of relevant anomalous events and activities. When this analysis is applied to traffic as well as assets, neural networks can provide assistance in explaining anomalous behaviors. While they can be used in both cases, neural networks are typically associated with the heuristic collection technique.
Sometimes used in neural network anomaly detection for intrusion detection, Bayesian analysis relies on knowledge of all of the paths and the ability to weight the paths. Weighting can be based on flow data, past attack patterns, other criteria or a combination of criteria. The weighted values translate into percentages thereby allowing the operator to allocate assets based on quantitative values that are used to predict vulnerabilities. Bayesian analysis works well with both collection techniques, though it is more often associated with heuristics.
Listen to this tip
as an MP3
Markov models, which are associated with several methods, are also used in analyzing probabilistic systems where state changes occur. When modeling a discrete space, the Markov process is known as a Markov chain and the model is referred to as a discrete space Markov model. When modeling a space that isn’t as well-defined, the deployed Markov model is known as a continuous-space Markov model. Similarly, time space that specifies fixed-time intervals for transitions requires a discrete-time Markov model, and models that allow for transitions at any time are known as a continuous-time Markov model.
By combining Markov modeling and reward analysis, Markov reward analysis provides meaningful data that can be used to evaluate anomalies for potential intrusions. Markov model selection is based in part on the problem space. A static, well-defined environment, such as one defined in a policy/profile-based environment, uses a discrete model; a more fluid environment associated with heuristics is a better match for the continuous-time/space Markov model.
Clustering is helpful when identifying patterns in data because it provides visualization support of network activities. As sites will typically have more than one asset targeted, malicious events tend to be related, not singular activities. New anomalous activities will reflect that in clusters, with these clusters often forming patterns. Although clustering can be used to display various patterns, determining the necessary parameters is an ongoing effort. Clustering can be used with either collection method.
The future of anomaly-based monitoring
The growth of big data will continue to drive migration toward anomaly detection technologies and better visualization tools, as many of the described processing methods are designed to scale in large enterprise environments. The anomaly-based monitoring model can potentially capture more malicious activity, such as zero-day attacks, insider threats and advanced persistent threats, than current signature-based technologies.
The ability to capture and process such a huge amount of data is promising, but the additional work required in processing anomalous data is significant and requires additional skills in analytic writing from security professionals. It will take time for enterprises to adopt anomaly-based monitoring.
Hybrid models are likely to emerge first, with small enclaves in large organizations using the policy/profile method while large groups use the heuristic method. Navigating the difficulties involved in changing to a new model will not be easy, but the advantages provided by the anomaly detection model will improve overall system and network security postures.
About the author:
Char Sample has close to 20 years of experience in Internet security, and she has been involved with integrating various security technologies in both the public and private sectors. She is a doctoral candidate at Capitol College, where her dissertation topic deals with the use of cultural markers in attack attribution.
This was first published in September 2012