How to spot attacks through Apache Web server log analysis

Log analysis requires refined search skills that will help you ferret out security issues. Brad Causey explains how to sift through log data and find the relevant security information.

It seems every new device, appliance and even desktop software program has the capability to generate logs or text-based

data. There are a number of challenges associated with managing the onslaught of log data.

The first is centrally storing and gathering these logs; luckily, there are a number of available products for this. Logs are usually shipped off to a syslog, log management or SIM system that is centrally located in the network. So the big question is: How do you sift through Web server log data and find relevant security information?

Although there are many different open source and commercial software applications that perform some level of log analysis, one thing is usually common among them -- regular expressions (regex). Regular expressions are basically a string of characters that allow nearly any scripting language or search tool to perform fast, advanced searches against large amounts of text data. There are a few variations of regex formats, and the most commonly used by scripting languages are called Perl-derivative regular expressions. These include regex formats for .NET framework, Python, Java, JavaScript and, of course, Perl. By using this type of regex in combination with any scripting language or search tool, you can quickly and efficiently parse large amounts of data for meaningful information.

Don't miss need-to-know info!

Security pros can't afford to be the last to know. Sign up for email updates from and you'll never be behind the curve!
One of the most common log formats we tend to see issues in is Apache, or httpd. These Web server logs tend to hide a number of secrets that are vital to find, such as attack attempts, successful attack signatures, and even precursor activities to an impending attack.

We will focus on the use of regex with egrep. Egrep uses a very simple syntax for searching files and is readily present on nearly every operating system in common environments today. (Windows users can download a free version from a variety of sources).

Keep in mind that regex used with egrep is also compatible with any program or scripting language that supports regex.

For this article, we'll look at Apache logs. But the concepts applied via egrep, regex and httpd logs can be used across hundreds of other platforms, tools and log types. Understanding what is dangerous and how to search for it is a great step toward recognizing security issues within your organization.

Step one: Web log format
In order to create expressions to analyze the contents of these logs, we need to understand the log entry structure. Apache stores something called a server access log, usually in /etc/httpd/logs, and typically is named something like access_log.

You can configure httpd (Apache) to send these logs to a syslog or SIM system; if so, your log format may be different from the default. Apache stores return delimited entries in access_log in the following format: - frank
[10/Oct/2007:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326

Let's break this down section by section. The first value,, is simply the client IP address, directly followed by the hostname of the client if HostnameLookups is enabled. Next, we have the date and time stamp, 10/Oct/2007:11:55:36 -0700. This is obviously important for correlation purposes.

Next, we have the HTTP header information. This is especially helpful because it gives us details about what request was made by the client. In this case, GET/apache_pb.gif HTTP/1.0 indicates a GET method of request, targeting the image file named apache_pb.gif that is located in the root of the httpd Web server's directory.

Finally, the server return code, 200, indicates the request was completed successfully. The last bit of information is simply the size of the object returned to the client for that request.

What to watch for

Here are a few key things to keep an eye out for when searching logs:

• Executable file requests, such as /system32/cmd.exe?c+dir

 • File system paths for *nix, such as /var/log or etc/shadow • SQL injection attempts, such as ' or 1=1— or SELECT

• High numbers of login attempts

• Attempts to access restricted areas of your site

• TRACE or OPTIONS request methods

• High numbers of 404 or 500 return codes
Step two: Begin your log analysis and investigation
Now that we understand the breakdown of the log format, we can begin to determine ways to check for requests that indicate suspicious activity. For example, requests that call for admin components such as WebMin, a Web server management tool, or admin, a common login interface name. This will most likely come as part of the request details in the log. With this in mind, we could simply place these names as strings in a regex query into egrep: >egrep -n webmin access_log

The structure of this is simple: egrep, followed by any configuration parameters, followed by the search criteria, followed by the name of the file to be searched.

In this case -n, will display the log line number for reference purposes.

This should produce any server log entries where a request was made to a URL containing webmin. An example return would look like:

57: - bob
[10/Oct/2007:20:24:18 -0700] "GET / webmin HTTP/1.0" 404 726

Breaking down our result, on line 57 of the log file, a request was made at 8:44 p.m. on Oct. 10 to our Web server, requesting the Webmin directory. We can also see the server returned a 404 message, indicating it unable to locate the directory. This is important because someone who should have access to administrative functions on the server would know where to look. Bob could be searching for a way to break into the server.

Step 3: Refine your server log search
It may be of interest to search for other requests by Bob, specifically ones that returned a 200 code, to indicate that he found something. Our command could look something like this: >egrep -n -i "bob|200" access_log

Although this will find log entries that have Bob or the integer 200 somewhere in them, it doesn't mean every log returned will be "200" server codes that Bob requested. This will actually return quite a bit of data we don't really want. It would be more accurate to search for logs with both Bob and 200. Because both Bob and 200 will have white space around them, we can further isolate the requests we are looking for. Also note the -i parameter, which will remove the case-match requirement so that Bob, bOb, boB, bob and BOB, all match our regex query.

egrep -n -i "\bbob\b.*200*" access_log

This command will restrict our query to only lines in the log that contain both the word bob and the number 200. The \b that you see on both sides of bob indicate a word boundary, or the start and stop of a word. The * you see before the 200 indicate that some character will exist between bob and the 200 and the * after the 200 allow for characters to exist after the 200. This would return entries such as this:

57: - bob
[10/Oct/2007:20:24:18 -0700] "GET / webmin HTTP/1.0" 404 726

59: - bob
[10/Oct/2007:20:24:59 -0700] "GET /admin HTTP/1.0" 404 726

65: - bob
[10/Oct/2007:20:25:35 -0700] "GET /login HTTP/1.0" 404 726

How to harden Linux operating systems

A reader asks expert Michael Cobb, "I've inherited management duties for a number of Linux-based servers within our organization, both Ubuntu and Red Hat. "Can you recommend some Linux OS-hardening best practices?"
What you will notice when inspecting the results is that it appears Bob is looking for something. Perhaps an admin interface of some sort, or a way into the Web server. Also, by paying close attention to the time stamp information, you can see all three requests were made within about one minute, which tells us Bob is really fast on his keyboard, or he is using an automated tool of some sort. The latter is most likely, and this may give us enough information to start investigating further into his actions.

Also, notice that Bob's requests were all met by 404 "not found" messages. If that is the case, then why did they show up? We did ask for only 200 codes, right? This is a prime example that a computer only does what you tell it to do, in this case, the date- time stamp happens to contain the string "200" and that is what we asked for. Using regex can often cause false positives, but by using our simple query, we were able to eliminate most of them.

Let's investigate Bob a little further.

Step 4: Follow the trail
As a last-ditch effort to track all of Bob's activities, we can search for all requests that Bob made from his IP address. This requires escaping the periods in the IP address as part of the regex. Escaping is a method of telling a regex engine that instead of using the special meaning for a character, we want to use it as a literal search. Note the command below:

>egrep -n -i "10\.10\.10\.10" access_log

In this case, we are telling egrep to find all instances of in the log file. Our results will look much like this: 57: - bob
[10/Oct/2000:20:24:18 -0700] "GET /web min HTTP/1.0" 404 726

59: - bob
[10/Oct/2000:20:24:59 -0700] "GET /admin HTTP/1.0" 404 726

65: - bob
[10/Oct/2000:20:25:35 -0700] "GET /login HTTP/1.0" 404 726

120: - [10/Oct/2000:21:14:11 -0700] "GET /index.html HTTP/1.0" 200 2571

157: - [10/Oct/2000:21:50:59 -0700] "GET /parent/directory HTTP/1.0" 404 726

260: - [10/Oct/2000:22:25:15 -0700] "GET /support.htm HTTP/1.0" 200 1056

So now we have a pretty good idea that Bob is poking around the site, but hasn't necessarily violated any laws or crossed any boundaries. But, it's a good idea to continue to watch for logs containing this information.

Using Web log data to stay alert
When looking for more dangerous attack indicators, keep an eye out for the frequency and destination of the request. For example, when monitoring an online banking application, keep a particularly close eye on requests sent to transfers. For example, we may see several of these when someone is trying to view other's transfer records: - [10/Oct/2000:x:x:x -0700] "GET /banking/view/transfer.jsp?id=12345 HTTP/1.0" 200 1042 - [10/Oct/2000:x:x:x -0700] "GET /banking/view/transfer.jsp?id=12346 HTTP/1.0" 500 798 - [10/Oct/2000:x:x:x -0700] "GET /banking/view/transfer.jsp?id=12347 HTTP/1.0" 200 1042 - [10/Oct/2000:x:x:x -0700] "GET /banking/view/transfer.jsp?id=12348 HTTP/1.0" 500 798

Here we can see where someone noticed the ID=xxxxx in the URL and tried incrementing the number by one until they found other transfer records. This is a serious breakdown in the security of the Web application and most certainly something you will want to catch when analyzing your logs.

Brad Causey is a senior security analyst, author, and Web security engineer. He holds the following certifications; MCP, MCDST, MCSA, MCDBA, MCSE, MCT, CCNA, Security+, Network+, A+, CTT+, IT Project+, C|EH, GBLC, GGSC-0100, CIFI, and CISSP.

This was first published in April 2009

Dig deeper on Web Server Threats and Countermeasures



Enjoy the benefits of Pro+ membership, learn more and join.



Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: