Incident response was tough enough when the challenge was getting to the bottom of what happened. For most organizations, when an incident is detected or suspected, gathering enough data to piece together what happened requires several hours of work piecing the logs together. The reason is simple: The majority of security appliances report what happened, but not who was behind the activity, historical information about that system or...
But today, regulatory compliance requirements are built on a strong security rationale for tying identity to activity. The reality is that compliance is driving organizations to do log management, and tying identity to activity helps get budget. Sarbanes-Oxley (SOX), for example, calls for strict controls over access to financial records, and that means it's critical to spot unauthorized activity by human beings.
"Organizations that perform log analysis are constantly reacting to events on the network, while still trying to be proactive," says Ron Gula, CTO Tenable Security. "When logs are tied to user identities, if there is a critical event, the user (or likely user) of the event can be quickly identified." User identity is a critical piece of information that shortens the analysis decision cycle and helps eliminate unimportant issues or gives us a high confidence for the events we mark as actionable priorities. For example, he says, "you may have no idea how many login failures constitutes a probe, but if you were to graph all of the login failures by user, you may be able to spot patterns you didn't know you had to look for in the first place."
Knowing the "who" as well as the "what" is more than a benefit for investigators; it is absolutely essential to an organization's security/compliance program. Who made unauthorized access to customer information databases? Who attempted to get root privileges on the domain server? Who cooked the financial records?
A classic compliance related case of tying activity to identity comes from improper access of medical records. Some of these cases, such as Britney Spears at UCLA Medical Center get a lot of press such as the time 13 employees improperly accessed her records in March 2008, but professionals in the field report this is fairly common. The stolen information can be sold, not only to sensationalist tabloids as in the case of celebrities such as Spears, Maria Shriver and George Clooney, but also to insurance firms.
Needless to say, this has the potential to put medical institutions at risk of both lawsuits for breach of privacy or emotional distress, and HIPAA compliance violations. The Department of Health and Human Services has not done a good job of enforcing HIPAA compliance to date, but that's changing with the recent $2 million CVS fine and the Obama Administration's emphasis on strong enforcement.
Tying user identity or activity is no easy task, but we're finally seeing the tools and developing the techniques that make tracking down the inadvertent or malicious offender.
Tracking Human Events
Why is tying identity to activity so difficult? At the heart of the problem is the "skinny" or "thin" event report (a term coined by Eric Fitzgerald of Microsoft). A computer, server, or security appliance kicks out a report to syslog with the information it has at hand. It can't gather any other information about the event, state information, the person logged in and so forth. The result is logs that typically give:
- Time and date of the event.
- IP Address or possibly hostname(s) involved.
- The program reporting the event.
- Severity. Common values are Fatal, Severe, Warning, Info, Debug, which are decided by the application and may or may not be accurate or useful.
- What happened from the reporting program's point of view.
Let's look at an example from Suhosin, a hardened version of the Hypertext Preprocessor (PHP):
Feb 24 09:56:43  ALERT - tried to register forbidden variable 'GLOBALS' through GET variables (attacker '126.96.36.199', file '/srv/www/live/sans/public_html/newsletters/risk/index.php')
Each of those fields is useful, necessary, but not sufficient. What is missing? To do a complete analysis, we generally need "fat" data--additional information that may not be available to the reporting program . Additional fields that are commonly needed to create actionable information from event data include:
- When the event happened: Feb 24 09:56:43 East Coast Time,
- Who initiated the activity: '188.8.131.52, according to nslookup, was assigned to webhost3.shadowrain.co.za at that time.
- Whether this is a stimulus or a response : It is a stimulus in this case, because webhost3 is initiating connections with www.sans.org.
- If the event we have collected is a response, have we identified the stimulus --or, in this case, since it was a stimulus, did we respond?
- What individuals and programs were involved? Ah there is the rub; we know the IP address, we know the machine name, but we have no idea who in South Africa is behind this activity.
- Did each event in the chain succeed or fail? This log entry is one of a series; webhost3 is probably running a scanner on www.sans.org, Hopefully, each of the probes fails.
- Is this over or ongoing? This probe has a start time and end time, so the event is over. We can only surmise that by looking at all the log entries from this IP address.
For years, putting the data together has been the responsibility of the security analyst. We flag an event in syslog because it has a key word we know indicates suspicious activity, such as "rejected", "dropped" or "denied." Then, we take the information that we have from the syslog entry and begin to work both backwards and forwards to find other related log events. Perhaps we have the IP address and need to consult the DHCP table to determine the host name and MAC address.
|Caveat Log Analysts|
Your conclusions are only as good as your data.
Any data modeling professional will quickly warn you that referential data is powerful and helpful to analyze and classify an event, if and only if that information is correct and is correlated correctly. If you visualize yourself as the analyst making decision on how to classify an event then you can clearly see that if these types of fields are misleading or wrong, you could easily arrive at the wrong conclusion. As an example if you were an analyst for a university investigating a log event:
Feb 25 02:55:19  ALERT - configured request variable name length limit exceeded - dropped variable '___df9d5760ba1af926bed589c89//modules/My_eGallery/index_php?basepath' (attacker '10.12.82.4', file '/srv/www/live/college/public_html/new/CS423/grades/display.php'
The login information for IP address '10.12.82.4' yielded a student name of John Brown, and the event history showed past warnings for hacking type behavior. One might immediately leap to a conclusion that the event was hacking-related and John Brown was at it again.
However, if any of that information was wrong, or correlated incorrectly, we might accuse John unfairly. What if John had plugged a wireless access point to the network connector in his dorm room and another student was using it while attempting to access the grades for his class? In fact, still another piece of referential data showed that John Brown was not even enrolled in CS 423. Why would you hack the grade server to change your grade for a class you aren't taking?--Stephen Northcutt
Next, we might go to the system or domain controller event logs to determine who was logged on. Did they log on the first time they tried, or were there multiple attempts? Where did they log on from: Were they local, or was it a remote log on? This type of network forensic analysis is doable, but it takes a long time and a complete knowledge of where to get the information.
Each event may take between 30 minutes and several hours to run to ground, and the work is somewhat tedious, especially when we have to work with data on different time zones. The high cost of manual correlation means many potential incidents are never investigated, and that means we fail to detect some events sometimes leading to devastating consequences, Such as the spectacular Barings Bank and Societe General frauds.
On the other hand, if we can use software to collect this information and display it in a meaningful way, an analyst can make a pretty good decision as to the severity of a log event in a matter of seconds, and our ability to detect and respond to potentially harmful events improves dramatically.
The keys will lie in our analysts' ability to look for changes in user behavior or attitude; report on segregation of duties, dual controls and access violations, and monitor activity and report on it. The good news is that we're getting the tools that are beginning to make this practical.
Tools Track Users
Since the stakes are so high and the need to tie identity to activity is so great, vendors are starting to deliver security solutions that can help. For instance, Sourcefire Real-time User Awareness (RUA) can be configured to send an alert any time a new user identity is detected, and this identity can be checked to see if it matches specific values.
Take the "Zippy" example. (This really happened. Though famous bank disasters are among the most serious account-related breaches, most security professionals with a couple of years of operational security experience have a security story involving a new, or modified account.) The company was a lab in which user names were all created from the first letter of the first name and the first six letters of the last name. A new account log entry for "zippy" caught our attention immediately. Either we had an employee named Zeke Ippy or we had a problem.
Account abuse did in many banks.
Failure to detect and monitor new accounts or use of excessive privilege is a critical example of the need to tie activities to users and their roles. Consider these spectacular examples.
One such failure led to the 1995 demise of the venerable Barings Bank the oldest merchant bank in the UK. Account 8888 had been set up to cover up a mistake made by another team member, which led to a loss of $20,000. That is bad, but it gets worse. Nick Leeson then used this account to cover his own mounting losses as a day trader. When the smoke cleared, Leeson had lost $1.3 billion and ultimately destroyed the 233-year old bank. All of Leeson's supervisors resigned (under pressure) or were terminated.
Jerome Kerviel, a trader with the French Societe Generale bank, had access that allowed him to far exceed his authority in European stock index trades. He was able to make unauthorized transactions that led to a loss of somewhere in the neighborhood of 4.9 billion Euros (more than $7 billion US).
In 2006, Kerviel began a series of fake trades mixed with large real trades, some of which actually exceeded the bank's capitalization. Somehow, he avoided normal controls based on timing and keeping winning and losing trades in balance to give the appearance of insignificant impact to the bank's bottom line. A number of DLP-friendly tools as well as simple scripts can help us detect new accounts.--Stephen Northcutt
If we had a list of all users, we could examine zippy to see if any user had a first name starting with "Z" and a last name with the string "Ippy." This can be done with a home-grown script using regular expressions, but over time, we're seeing vendors deliver more regular expression capability so that tools can be configured to support business logic.
Security architects can now depend on one or more of logging and analysis industry categories of tools that can deliver "fat" data that tie user ID and other related information to event logs. These tools include:
- Security information event managers ( SIEMs ).
- Log management devices, which are primarily collectors of log files.
- Central console of a security products company that offers a number of additional capabilities, not just logging and analysis. For example, both Tenable and Sourcefire have several security products, which report in to central consoles and strive to deliver fat data.
These products receive the thin events and create fat data for analysis. As the vendors continue to add functionality, these product categories tend to overlap and are less defined than they were a couple years ago. SIEMs, for example are now emphasizing their log management capabilities (or spinning off separate products) to capitalize on compliance-driven market demand. And, some log management products are developing more SIEM-like capabilities.
The flow goes like this. An event occurs, and a thin log file describing that event is created and sent to a collector. (A site may have one or more collectors.) The collector may store it as a raw, unaltered, pre-normalization event. The log event may also be stored with a matching cryptographic hash to prove it has not been tampered with.
If the site wants to do more than simply store the log, a copy of the log event is sent to an analysis engine. The log event can be evaluated by rules that are designed to either confirm and record normal events, or designed to detect abnormal or bad events.
The rules may be based on regular expression technology to parse raw events, but sophisticated products normalize the logs. Normalizing breaks down raw data into component standardized fields, stored in a database, so we may be able to correlate with other information. Examples of the types of fields we might see in an event database include day of week, hour of day, ID, UTC time, local time, time zone, PID, OS name, OS version, application version, host name, host IP, host domain name, MAC address, application reason, severity type.
Once the data is normalized and in a database, our tools create a fat event by adding other referential data such as the history of that IP address/MAC address/system name; related vulnerability scan information; history of similar event sand login, identity or access data. This level of information will help the analyst make an informed decision much faster. One warning note: Information isn't always what it seems, so apply so don't leap to obvious conclusions about what the data appears to be telling you.
Since referential data is important, organizations that take log analysis seriously want as much of it as they can get. One useful tool is the passive sniffer. These tools are typically placed near aggregation points such as the firewall and listen to and analyze the traffic passing by. They are able to determine what operating systems are associated with particular addresses. They also can determine the version of software that is running. This is a huge step up from the basic firewall log of port and IP address. In addition, they can pinpoint the existence of vulnerabilities. Because they are creating their referential state tables by listening to traffic, they are more current than static network inventory tables that are manually updated. There is an open-source example called P0f, and Sourcefire and Tenable Security have commercial products--Sourcefire Real-time Network Awareness (RNA) and Tenable Passive Vulnerability Scanner. Both companies offer a central console, sort of a mini-SIEM, to collect and manage the event data their various products create. Identifying the event in syslog and querying these vendor consoles is still a manual process, but it's a huge step up from everything being manual.
With sophisticated SIEMs, it is becoming increasingly possible to tie thin events to an identity in useful ways. It's been hard to do previously because the average person has multiple accounts--email, windows, VPN, intranet, app-specific IDs, IM, etc. While a SIEM can collect activity across these accounts, for the data to be actionable, we must associate all of these accounts to a single person. Using ArcSight ESM, for example, an analyst selects one account ID as the user's unique ID. Then it is possible to map all the other accounts for that user to the unique ID. SIEMs such as ESM use several methods to connect log activity to identity, including agents and sending native operating system credentials.
The only way to detect changes in behavior with technical controls is to tie identity to activity over a long enough period of time to establish a baseline. What if the amount of Web connection time to social media such as Twitter and Facebook suddenly increases? It might indicate that user is wasting time instead of working. Or, a major increase in time on LinkedIn might indicate establishing connections in advance of leaving the current organization. However, there is no way to detect an increase if we do not have a baseline.
You can expect a SIEM that supports identity to activity mapping to be able to integrate with Active Directory or Network Directory. This means in addition to the accounts, you also get group or role information. Even though organizations have been slow to implement network access control (NAC) at the enterprise level, the capability is built in to more and more software and appliances and it is starting to happen.
One exciting capability of tying identity to activity is to use historical activity data into ArcSight's activity profiling technology to generate statistical patterns and create new rules. For example, you might run the activity of the last 50 people who quit to see what activities they did that those who are still there didn't do. Then if you see that activity, you can auto-escalate a watch list and make sure the person doesn't leave with data, files, etc.
Or, in a down economy, if you have to announce that your organization can't issue bonuses one year, you might profile the activity of users before the announcement compared to after the announcement. A recent study by The Ponemon Institute (sponsored by Symantec) interviewed 945 U.S. adults who had been laid-off, fired, or changed jobs within the last year and found that more than half took company information with them when they left.
The rationales for taking the data included help getting another job, help starting their own business, or simple revenge. All of the participants in the survey had access to proprietary information, including customer data, employee information, financial reports, software tools and confidential business documents. The survey also found that just 15 percent of the companies examined the paper and/or electronic documents their former employees took with them when they left."
Every organization struggles with the amount of effort it takes to get real benefit from log file analysis. Obviously. one big win is compliance. Most regulatory bodies either require or strongly suggest log monitoring. The Consensus Audit Guidelines specifically refers to the importance of tying identity to activity. Two examples are enforcing controls on dormant accounts and continuously evaluating need to know. In both cases, you have to know who the user is and what his role should be.
With log monitoring, nothing succeeds like success. Think of the value of an analyst who takes the time to run a suspicious event to ground and finds something significant, such as an employee collecting a list of customer personally identifiable information and sending it to his hotmail account. The damage can be minimized by rapid detection and response. Logging, which is usually considered dull and boring work, becomes exciting.
That is really one of the biggest benefits of tying identity to activity. Hits on the firewall, spam messages dropped, error conditions in a program, the amount of free disk space, are all important, of course. Humans, though, do the craziest things, and when you add the human part of the equation to log events, it is a whole new ball game. It wouldn't be surprising if the next few years yield a number of exciting security detection techniques as we correlate identity and get better at creating fat events for analysts to review.
Stephen Northcutt founded the GIAC certification and currently serves as president of the SANS Technology Institute, a post graduate level IT Security College. He is author/coauthor of Incident Handling Step-by-Step, Intrusion Signatures and Analysis, Inside Network Perimeter Security 2nd Edition, IT Ethics Handbook, SANS Security Essentials, SANS Security Leadership Essentials and Network Intrusion Detection 3rd edition.