"Trust, but verify" -- a Russian proverb.
The rise of fake news has made some organizations consider the implications of unverified data.
A previous article called attention to the data fidelity problem and proposed a method to bind the data to the environment in which the data object was created. The idea was only a concept, but now the details have begun to emerge.
Understanding the context requires measuring the important assets of inventory along with the asset requirements. In the same manner that insurance companies evaluate physical assets, cybersecurity requires a similar rigor to provide accurate measurements, evaluation and predictions. When described in the abstract, this all seems quite simple; however, as you move forward, you can see that there is quite a bit of work ahead.
Data fidelity requires the contextual evaluation of data in terms of security. This means examining the data objects within the context of the environment in which they were created.
In order to gather this data, you must not only re-examine what you deem important, but you must do so within the context of the tasks that you are attempting to support. The task support piece is critical because this bounds the problem space in which you can work. If the problem space is not bounded, all of the solutions will remain brittle point solutions that continue to fail when new problems are introduced.
The ways systems can fail seems endless, but the ways systems can perform correctly are limited. This characteristic is key in any analysis that requires accurate predictions. Coincidentally, this same characteristic is oftentimes overlooked when attempting to accurately predict outcomes in the cyber domain.
Three disciplines can assist in creating the boundaries and gathering the contextual data required to ensure data fidelity: dependency modeling, resiliency and reliability.
Dependency modeling is used to define the desired outcome and how to achieve it. For example, in order for a legitimate alert to occur, a sensor must recognize an environmental perturbation.
For the sake of argument, we will avoid the signature versus anomaly detection discussion at this point in time. The main point here is that when an alert occurs, a process runs differently than when normal non-alerts occur.
Data processing takes a different path, and all of this is dependent upon such items as memory, processing cycles and bandwidth. Dependency modeling covers the gathering, ordering and quantities associated with data contextualization.
When examining network traffic, dependency models show the flow of traffic from client to server and back to include traffic values associated with router data, domain name system, web server and other server session data that is required for the communication to occur. These sessions have characteristics that are consistent and measureable and can be quantified for predictive purposes.
Resilience processing deals with the fact that the mission goals may be under attack and that successful planning requires robustness, redundancy, resourcefulness and rapidity. Here, again, we find the need to understand what components are critical to successful planning, including environmental variables.
Additionally, organizations should consider prioritizing and determining which functions require redundancies and under which conditions. Finally, resiliency introduces a temporal component into the model. The temporal component becomes another variable that helps to bind the solution and increase accuracy in predictions.
Thirdly, the reliability component recognizes not only the temporal component, but also the environmental component -- hence the requirement that the environment be defined when attempting to model reliability.
However, environments are not always static. In fact, the environment in the cybersecurity world is rather dynamic. For this reason, processing states should be defined. Processing states can be generally defined within the terms of the operational profile, a numerical model of operations in the environment.
The measurements taken to obtain the operational profiles in the various meta-states of processing -- startup, idle, normal, busy, stressed, failing and dead -- will in some cases support variance -- normal, busy, stressed and failing.
In other cases, variance suggests behavior that may coincide with a compromise -- startup and idle. The most important part is that the states be defined and included in the planned operating environment. Many of the variables associated with the required metrics used in network operational profiles are the same variables that can be obtained from quality of service data.
How this differs from traditional anomaly detection
It's important to note how this processing differs from traditional anomaly detection processing. The differences occur in two main areas: first, the session-oriented nature of the work, and second, the binding of environmental variables to the data objects being observed and providing the basis for security operations.
Simply put, alerts can be manipulated; however, when a true alert occurs, other areas of the network that have not been historically monitored can reveal traces of the event. These environmental traces can be thought of as the waves that appear when water is disturbed.
If we envision the virtual environment as the water that surrounds a rock in a lake, when the rock is disturbed, we can examine the ripple pattern in addition to the rock. Similarly, when the data in a virtual environment is disturbed, the environment that surrounds that data is also disturbed, and you can see the ripples if you choose to look for them. The three disciplines of dependency modeling, resilience and reliability can assist you to make the task manageable and ensure data fidelity.