The first is centrally storing and gathering these logs; luckily, there are a number of available products for this. Logs are usually shipped off to a syslog, log management or SIM system that is centrally located in the network. So the big question is: How do you sift through Web server log data and find relevant security information?
Although there are many different open source and commercial software applications that perform some level of log analysis, one thing is usually common among them --
Requires Free Membership to View
|
||||
We will focus on the use of regex with egrep. Egrep uses a very simple syntax for searching files and is readily present on nearly every operating system in common environments today. (Windows users can download a free version from a variety of sources).
Keep in mind that regex used with egrep is also compatible with any program or scripting language that supports regex.
For this article, we'll look at Apache logs. But the concepts applied via egrep, regex and httpd logs can be used across hundreds of other platforms, tools and log types. Understanding what is dangerous and how to search for it is a great step toward recognizing security issues within your organization.
Step one: Web log format
In order to create expressions to analyze the contents of these logs, we need to
understand the log entry structure. Apache stores something called a server access log, usually in /etc/httpd/logs, and typically is named something like access_log.
You can configure httpd (Apache) to send these logs to a syslog or SIM system; if so, your log format may be different from the default. Apache stores return delimited entries in access_log in the following format:
10.10.10.10 - frank
[10/Oct/2007:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326
Let's break this down section by section. The first value, 10.10.10.10, is simply the client IP address, directly followed by the hostname of the client if HostnameLookups is enabled. Next, we have the date and time stamp, 10/Oct/2007:11:55:36 -0700. This is obviously important for correlation purposes.
Next, we have the HTTP header information. This is especially helpful because it gives us details about what request was made by the client. In this case, GET/apache_pb.gif HTTP/1.0 indicates a GET method of request, targeting the image file named apache_pb.gif that is located in the root of the httpd Web server's directory.
Finally, the server return code, 200, indicates the request was completed successfully. The last bit of information is simply the size of the object returned to the client for that request.
|
||||
Now that we understand the breakdown of the log format, we can begin to determine ways to check for requests that indicate suspicious activity. For example, requests that call for admin components such as WebMin, a Web server management tool, or admin, a common login interface name. This will most likely come as part of the request details in the log. With this in mind, we could simply place these names as strings in a regex query into egrep: >egrep -n webmin access_log
The structure of this is simple: egrep, followed by any configuration parameters, followed by the search criteria, followed by the name of the file to be searched.
In this case -n, will display the log line number for reference purposes.
This should produce any server log entries where a request was made to a URL containing webmin. An example return would look like:
57:10.10.10.10 - bob
[10/Oct/2007:20:24:18 -0700] "GET / webmin HTTP/1.0" 404 726
Breaking down our result, on line 57 of the log file, a request was made at 8:44 p.m. on Oct. 10 to our Web server, requesting the Webmin directory. We can also see the server returned a 404 message, indicating it unable to locate the directory. This is important because someone who should have access to administrative functions on the server would know where to look. Bob could be searching for a way to break into the server.
Step 3: Refine your server log search
It may be of interest to search for other requests by Bob, specifically ones that
returned a 200 code, to indicate that he found something. Our command could look something like this:
>egrep -n -i "bob|200" access_log
Although this will find log entries that have Bob or the integer 200 somewhere in them, it doesn't mean every log returned will be "200" server codes that Bob requested. This will actually return quite a bit of data we don't really want. It would be more accurate to search for logs with both Bob and 200. Because both Bob and 200 will have white space around them, we can further isolate the requests we are looking for. Also note the -i parameter, which will remove the case-match requirement so that Bob, bOb, boB, bob and BOB, all match our regex query.
egrep -n -i "\bbob\b.*200*" access_log
This command will restrict our query to only lines in the log that contain both the word bob and the number 200. The \b that you see on both sides of bob indicate a word boundary, or the start and stop of a word. The * you see before the 200 indicate that some character will exist between bob and the 200 and the * after the 200 allow for characters to exist after the 200. This would return entries such as this:
57:10.10.10.10 - bob
[10/Oct/2007:20:24:18 -0700] "GET / webmin HTTP/1.0" 404 726
59:10.10.10.10 - bob
[10/Oct/2007:20:24:59 -0700] "GET /admin HTTP/1.0" 404 726
65:10.10.10.10 - bob
[10/Oct/2007:20:25:35 -0700] "GET /login HTTP/1.0" 404 726
|
||||
Also, notice that Bob's requests were all met by 404 "not found" messages. If that is the case, then why did they show up? We did ask for only 200 codes, right? This is a prime example that a computer only does what you tell it to do, in this case, the date- time stamp happens to contain the string "200" and that is what we asked for. Using regex can often cause false positives, but by using our simple query, we were able to eliminate most of them.
Let's investigate Bob a little further.
Step 4: Follow the trail
As a last-ditch effort to track all of Bob's activities, we can search for all requests that
Bob made from his IP address. This requires escaping the periods in the IP address as
part of the regex. Escaping is a method of telling a regex engine that instead of using the special meaning for a character, we want to use it as a literal search. Note the command below:
>egrep -n -i "10\.10\.10\.10" access_log
In this case, we are telling egrep to find all instances of 10.10.10.10 in the log file. Our results will look much like this:
57:10.10.10.10 - bob
[10/Oct/2000:20:24:18 -0700] "GET /web min HTTP/1.0" 404 726
59:10.10.10.10 - bob
[10/Oct/2000:20:24:59 -0700] "GET /admin HTTP/1.0" 404 726
65:10.10.10.10 - bob
[10/Oct/2000:20:25:35 -0700] "GET /login HTTP/1.0" 404 726
120:10.10.10.10 - [10/Oct/2000:21:14:11 -0700] "GET /index.html HTTP/1.0" 200 2571
157:10.10.10.10 - [10/Oct/2000:21:50:59 -0700] "GET /parent/directory HTTP/1.0" 404 726
260:10.10.10.10 - [10/Oct/2000:22:25:15 -0700] "GET /support.htm HTTP/1.0" 200 1056
So now we have a pretty good idea that Bob is poking around the site, but hasn't necessarily violated any laws or crossed any boundaries. But, it's a good idea to continue to watch for logs containing this information.
Using Web log data to stay alert
When looking for more dangerous attack indicators, keep an eye out for the frequency and destination of the request. For example, when monitoring an online banking application, keep a particularly close eye on requests sent to transfers. For example, we may see several of these when someone is trying to view other's transfer records:
10.10.10.10 - [10/Oct/2000:x:x:x -0700] "GET /banking/view/transfer.jsp?id=12345 HTTP/1.0" 200 1042
10.10.10.10 - [10/Oct/2000:x:x:x -0700] "GET /banking/view/transfer.jsp?id=12346 HTTP/1.0" 500 798
10.10.10.10 - [10/Oct/2000:x:x:x -0700] "GET /banking/view/transfer.jsp?id=12347 HTTP/1.0" 200 1042
10.10.10.10 - [10/Oct/2000:x:x:x -0700] "GET /banking/view/transfer.jsp?id=12348 HTTP/1.0" 500 798
Here we can see where someone noticed the ID=xxxxx in the URL and tried incrementing the number by one until they found other transfer records. This is a serious breakdown in the security of the Web application and most certainly something you will want to catch when analyzing your logs.
Brad Causey is a senior security analyst, author, and
Web security engineer. He holds the following certifications; MCP, MCDST, MCSA, MCDBA, MCSE, MCT, CCNA, Security+, Network+, A+, CTT+, IT Project+, C|EH, GBLC, GGSC-0100, CIFI, and CISSP.
This was first published in April 2009
Security Management Strategies for the CIO
Join the conversationComment
Share
Comments
Results
Contribute to the conversation