- Andreas M. Antonopoulos
Security information management, or SIM, is the discipline and technology used to collect security information from log files and other sources in order to detect and react to security events. SIM is a lot broader still. Today’s servers, network devices and applications generate enormous amounts of logs and an infinite variety of information. That mountain of data can be analyzed, correlated and filtered to arrive at a myriad of conclusions related to security, compliance and application performance to name just a few. With so many possible uses, a SIM solution must be focused or it will succumb to information overload, performance problems and become unusable.
SET LIMITED GOALS FOR SIM
The function of SIM is to find indications of specific security events in a large set of security events. In plain terms, SIM is about finding a needle in a … needle stack. It is a difficult task to begin with, yet many companies make it even more difficult by trying to do too much with SIM, effectively making the “needle stack” bigger and bigger.
The first and most important part of deploying a SIM is to focus on a single goal or a limited set of goals. Is the SIM intended to detect intrusions? Is it intended to find compliance exceptions? Is it focused on internal or external attacks? With so many possible uses for the technology, it is easy to lose focus and answer “all of the above.” The more business problems you try to solve with SIM, the less effective it is at solving any one of them.
What logs should you collect? If you set a specific goal for SIM, then it is easier to decide what to collect. For example, if you have decided to focus on Payment Card Industry Data Security Standard (PCI-DSS) compliance, then firewall logs would be among the first you must collect and analyze. PCI is one of the few prescriptive regulations, so there is plenty of PCI specific guidance on log collection. Other regulations are not so easy: HIPAA for example will require that you protect personal health information (PHI), but what logs do you collect to do that?
A common mistake in log collection occurs when security professionals apply filtering too early in the process. In a healthcare organization, for example, the SIM was configured to only collect log entries for failed login attempts. The reasoning was that such log entries would point to any attempts to break in to the system. Unfortunately, when a breach did occur, the company in question discovered that the attackers had already compromised a user password. So the attackers were not generating failed login entries, they were generating successful logins with a legitimate user account. Because those were not collected, the security experts brought in to analyze the breach were unable to see what the attackers had done.
This is a difficult balance to strike: if you don’t collect some specific detail it will not be there when you need to review historical data after a breach. But if you collect everything, you will have performance and storage growth problems. Little details that seem unimportant in advance may be the crucial piece of data you needed in retrospect.
SIMs can be used for more than forensic analysis of historical data after a breach. Many companies try to make SIM solutions the basis of real-time incident response in a security operations center (SOC). The distinction between real-time incident response and post-incident historical analysis is critical because those two goals are often at odds, if not directly conflicting. SIM solutions optimized for real-time analysis are tuned for high-performance correlation and filtering, minimizing the collected information as much as possible. To make a SIM actionable for historical analysis post-breach you must tune it in almost the exact oppositeway to maximize the information collected and stored.
Actionable results are results that can be used in the context of an operational workflow. If your goal is compliance, then the workflow should produce an annual audit report. If your goal is breach detection then the results should enable investigation of security alerts by an analyst with an incident response workflow. If you’re trying to improve security operations, then the SIM results should support a change and configuration management workflow. You can’t make a SIM produce actionable results without the context of a specific goal and workflow you’re trying to achieve.
MAKING SIM PART OF A SECURITY GOVERNANCE PROGRAM
Once you’ve focused your SIM solution on a single goal and successfully integrated it into your standard security operations, then you can start gradually expanding your focus to address other business challenges. This doesn’t mean collecting more data, because that will only make the SIM slower and less focused. Rather, it means finding new ways to re-use the SIM results as part of your overall security governance program.
Many companies are gradually shifting their attention from compliance to risk management. Risk management means looking at the risks to the organization and managing that risk by prioritizing security controls and incident response according to the risk to the business. For SIM solutions, that means correlating security event information with risk information such as:
- User identity/role: Which users/roles were involved or affected
- System risk: Whether the system affected is a business critical system
- Application risk: Whether the application affected is business critical
To add a risk management perspective to a SIM product you don’t necessarily have to collect any more logs. In most cases you can supplement your existing log data with information about the risk/criticality of the applications, systems, users and roles. For example, many SIM products give security analysts the ability to classify servers into multiple tiers of criticality with a numeric score (such as 1 through 5 where 1 is most critical and 5 is least) or labels such as high, medium or low. These additional attributes add another perspective to your results, allowing security professionals to prioritize business critical security events over non-critical events. The key distinction is that criticality is an attribute of the business, not a security attribute.
Similarly, to adapt a SIM for regulatory compliance, you don’t necessarily need more logs. In many cases you must relate your existing SIM results to specific audit report requirements or rules. Most compliance regulations represent the minimum security controls that are appropriate in any industry. If you designed your SIM to support a robust security program, you probably already collect all the logs needed for compliance.
Some companies may want to implement real-time response to security events. Without a doubt, this is an ambitious goal for any company. Real-time response requires almost perfect execution on technology, process and people--and few companies are successful. If you embark on this ambitious goal, make sure it is the final goal of a well established and successful SIM implementation, not the first goal. To make a SIM suitable for real-time correlation and analysis you must pick out a subset of your log data to correlate. Real-time analysis of too much log data will result in poor performance, but collection of too few logs will make post-incident analysis impossible, because there will be gaps in the record. A successful implementation splits the log data into a long-term archive, which is as comprehensive as possible and smaller log data feed for correlation and alerting.
The key to success is incrementalism: start with a limited and focused goal, then gradually add capabilities while making sure the SIM system is not overwhelmed with data. If you look at a SIM solution as a tool that is part of a broader operational process you will realize that the weakest link is not the technology, it’s the operators. If you set an overly ambitious goal for SIM, you will not only overwhelm the system with logs, you will overload the people with log results too. The operators of a SIM that is producing too many results and alerts have two options: throttle the results, thus risking a missed security event; or ignore the alerts. Many SIM deployment projects have succumbed to one or both of those reactions and have ended up disused or cancelled after a while.
SECURITY KAIZEN: USING SIM FOR CONTINUOUS QUALITY IMPROVEMENT
An interesting way to look at SIM is to consider it as the feedback loop of a security program. All the policies, configurations and security devices ultimately express themselves in the form of security events. A SIM is a great way to close the loop on your risk management program and use it to find out if the policies are working as they are supposed to.
To turn the SIM into a powerful feedback mechanism, security operations managers have to make it part of a continuous improvement program. Every event produced by the SIM hints at the success or failure of security in an organization. When a security professional reviews a security event, they have an opportunity to go beyond the event and examine the root causes in policy, process or controls. In Japanese manufacturing, the process of continuous quality improvement is called “kaizen”, a concept that can have tremendous implications for improving a security program.
At the first level of inspection, a security event can highlight omissions or errors in the log collection process. While reviewing an event, a security analyst can find that they are missing a crucial piece of information from a log that was not collected. In many cases, that insight is not acted upon and nothing changes. In a continuous improvement process, a security analyst will be able to submit a change request form to add a log to the collection set.
In the process of reviewing a security event, a security analyst may discover an earlier event that was critical but did not generate an alert. Again, there is an opportunity to improve the process by adding an alert pattern to the SIM. For example, an analyst may receive an alert about an inappropriate database access pattern, such as a SQL query that is unusual. While investigating the event the analyst notices that not only is the specific query unusual but that it was initiated from a different IP address and database user than the one expected by the application. While the SIM generated an alert only for the strange query, it could have alerted on the strange connection origin and credentials that preceded it. By adding this pattern, the analyst ensures that in future there will be an alert on the earlier event, allowing a security team to respond faster.
Most businesses sustain a frenetic pace of change, both in their business process and in the supporting infrastructure. Of course, change is the arch-enemy of security, because change introduces errors and errors introduce security risk. A SIM is often the first place that configuration errors show up, whether they trigger security alerts or just uncorrelated and seemingly spurious log entries. It is crucially important to follow up on change-related events in a SIM, not just because of possible security exposure created by the change but also because the change may have broken the SIM’s correlation and filtering patterns. For example, a routine upgrade in an application can change a log message from “Login failed: unknown user” to “Failed login: no such user.” To a human, both messages are equivalent but to a string pattern matching engine of a SIM, they are completely different. If you were getting alerts before the upgrade, you will no longer get alerts.
At the most fundamental level, however, a SIM gives you an opportunity to improve the policies and processes, not just the technology. If you collect different logs, change the patterns or introduce a new pattern you will improve the SIM. But if you use the SIM to identify broken or inefficient processes or misapplied security policies, you will improve the company’s overall security, not just the SIM.
Security analysts need more than a change request form to fix policies and processes in a security organization. To get that extra layer of kaizen you must conduct a post-incident review with the explicit goal of process improvement. You need to look at a security incident, from a simple policy violation by a well-meaning employee to a catastrophic breach as an opportunity to review your existing policies and procedures with an eye for improvement.
Security information management tools are an important part of a security organization’s arsenal. They are so versatile that many companies are led to overreach, trying to do far too much far too quickly. As a result, many SIM deployments fail because of performance problems, human operator overloading or ballooning costs without significant results. To avoid this outcome, start small, focus on a single goal and gradually expand the scope as you build on success.
Andreas M. Antonopoulos is a senior vice president and founding partner with Nemertes Research, where he develops and manages research projects, conducts strategic seminars and advises key clients. Andreas is a computer scientist, a master of data communications and distributed systems, a Certified Information Systems Security Professional (CISSP), with an engineering, programming and consulting background. For the past 16 years, has advised a range of global industries on emerging technologies and trends.