Ryan Barnett of Breach Security Inc. and leader of the Web Application Security Consortium (WASC) Honeypot Project talks about phase three of the project, which uses an open proxy server to analyze Web attack data. Formally called the Distributed Open Proxy Honeypot Project, in phase three it will be will more widely deployed, adding more participants and analytics. The Honeypot Project uses the open source mod_security Web application firewall (WAF) to monitor, identify and report the attack traffic.
Read the full transcript from this video below:
Please note the full transcript is for reference only and may include errors. To report an error, contact firstname.lastname@example.org.
WASC Web Honeypot Project enters next phase
Ryan Barnett: The actual name is a very long name, unfortunately for the project, but it is very descriptive. It is called the Distributed Open Proxy Honey Pot Project. The idea there, as you can tell from the name, we are different than typical honey pots where we are not aiming to be target of the attack. We know that the bad guys commonly like to use open proxies to loop through where they are going to hide their source IP. We are an open proxy; anybody can come in and use us and send a request wherever, and we will go ahead and process that for them. The difference is we have a certain rule set. We have Mod Security, which is a web app firewall on it, customized rule set for the honey pots. We see malicious traffic then we log it back to a central log host in real time. The aim of the project is really to gather that type of real web attack data because there is a lot of different resources out like the OWASP Top 10, CBE, different places where they track statistics on vulnerabilities, so those are likely ways that somebody could break in. There is no concrete evidence to zero in to say, 'These are the specific vulnerabilities that are actually being targeted.' That is one of the aims that we have is actually to highlight that, to either reinforce what the statistics are saying or to counteract them, and say, 'No, these are the things you should focus on.'
Interviewer: You are using an open proxy server. You are turning this, turning what the bad guys use against them, right?
Ryan Barnett: Absolutely. Before I came on working with Reach Security, who sells app firewalls and using mod security, I was actually a consultant working with Government clients in DC, and I was in charge of protecting their public websites. I always got frustrated when we would do initial trace-back for the web attack, and 9 times out of 10 it was an open proxy server, and that is about as far as you can go. It is tough to get coordination to track all that stuff back, that is why, when we were figuring out what is the best mechanism to use to gather this data, that is when I said, 'Ah-ha, yes. Turn the tables on the bad guys.' They are not targeting us, but we know it is a common tool that they use, so we just get a bird's eye view.
Interviewer: What does it mean, what does phase 3 actually do? What happens here? Is it going to be open sourced?
Ryan Barnett: Actually, that is a good question. Starting phase 3, the previous two phases were really learning phases, we were taking baby steps. They actually started in 2007. The first one ran for about 4 months, the second phase was only for about 3 months. We had to take a hiatus for a while to really figure out what we wanted to achieve in phase 3, what are the hardware requirements, and really re-architect everything. Phase 3, the two different main goals that we have are we want many, many more sensors geographically distributed, also being deployed into different network blocks. We have ISP home users, we have universities, we have some government participation, so more sensors, that ends up meaning a lot more data, so we are going to do a lot more data analysis that we did in the past. Before it was high level statistics; we want to dive deeper. We want to use a lot of different tools to analyze the data to look for trends, different attacks.
Another big goal that we have is to release data much, much more frequently. In previous phases, we just did a wrap-up report at the end of the phase, and we learned that you got to get data out when it is hot, when it is new, so it is relevant.
The real big thing is automation. We know that the bad guys, for the most part, are not just sitting at a keyboard manually trying to break into a website. Usually, they have automated tools that are going to do initial scanning, hitting as many servers as possible. Based on the data they get back, then they might go back later and do some manual attacks and things like that. Automation is the bad guys' friend -- they use it extensively -- and what we are seeing is that the information to give back to people that want to protect websites is they need to have means to identify if a client is automated and doing something that not a normal person would do in a browser. Most websites unfortunately, do not track that. They do not see a difference, which lends itself then, for opening up your website to brute force attacks, where people break into your authentication and things like that.
Yes, automation is the big thing that we have seen from the previous phase. We just started phase 3, actually yesterday, so we only have two sensors back online, but I will say event the data that we have seen thus far, we actually already have over 170,000 transactions, a lot of it spammers, but once again, it is automation. They are figuring out ways to post, blog things up to blog forums, user forums, hawking what you would expect Cialis, Viagra, anything that they can but it is automation, definitely. Websites need to have anti-automation capabilities to identify them.
Interviewer: Is that easy for websites to deploy anti-automation capabilities? Easier said than done?
Ryan Barnett: Yes, because it is a big umbrella and really, there are different categories when you are talking about anti-automation. One example I gave would be brute force protection. If you have a login page, and you do not want people to brute force it, so spammers do not create thousands of bogus email accounts to send their spam from, that would be a different threshold that you would want to monitor for anti-automation versus scraping-types of attacks, where it is actually a legitimate user who logs in, they have access to something that your web site does such, as generating reports or something. The issue is that they automate what they are doing, so they extract out tons of information, much more than they should have extracted, so the issue ends up being you need to know what you are protecting, then set appropriate thresholds to notice when somebody is automating their process.