Google hacking is the use of a search engine, such as Google, to locate a security vulnerability on the Internet. There are generally two types of vulnerabilities to be found on the Web: software vulnerabilities and misconfigurations. Although there are some sophisticated intruders who target a specific system and try to discover vulnerabilities that will allow them access, the vast majority of intruders start out with a specific software vulnerability or common user misconfiguration that they already know how to exploit, and simply try to find or scan for systems that have this vulnerability. Google is of limited use to the first attacker, but invaluable to the second.
When an attacker knows the sort of vulnerability he wants to exploit but has no specific target, he employs a scanner. A scanner is a program that automates the process of examining a massive quantity of systems for a security flaw. The earliest computer-related scanner, for example, was a war dialer
Today there are scanners that automatically query IP addresses to see what proxy for exploits. A proxy is an intermediary system that an attacker can use to disguise his or her identity. For example, if you were to gain remote access to Bill Gates' computer and cause it to run attacks on treasury.gov, it would appear to the Feds that Bill Gates was hacking them. His computer would be acting as a proxy. Google can be used in a similar way.
The search engine has already gathered this information and will give it freely without a peep to the vulnerable site. Things get even more interesting when you consider the Google cache function. If you have never used this feature, try this:
Do a Google search for "SearchTechTarget.com." Click on the first result and read a few of the headlines. Now click back to return to your search. This time, click the "Cached" link to the right of the URL of the page you just visited. Notice anything unusual? You're probably looking at the headlines from yesterday or the day before. Why, you ask? It's because whenever Google indexes a page, it saves a copy of the entire thing to its server.
This can be used for a lot more than reading old news. The intruder can now use Google to scan for sensitive files without alerting potential targets -- and even when a target is found, the intruder can access its files from the Google cache without ever making contact with the target's server. The only server with any logs of the attack would be Google's, and it's unlikely they will realize an attack has taken place.
An even more elaborate trick involves crafting a special URL that would not normally be indexed by Google, perhaps one involving a buffer overflow or SQL injection. This URL is then submitted to Google as a new Web page. Google automatically accesses it, stores the resulting data in its searchable cache, and the rest is a recipe for disaster.
How can you prevent Google hacking?
Make sure you are comfortable with sharing everything in your public Web folder with the whole world, because Google will share it, whether you like it or not. Also, in order to prevent attackers from easily figuring out what server software you are running, change the default error messages and other identifiers. Often, when a "404 Not Found" error is detected, servers will return a page like that says something like:
The requested URL /cgi-bin/xxxxxx was not found on this server.
Apache/1.3.27 Server at your web site Port 80
The only information that the legitimate user really needs is a message that says "Page Not found." Restricting the other information will prevent your page from turning up in an attacker's search for a specific flavor of server.
Google periodically purges it's cache, but until then your sensitive files are still being offered to the public. If you realize that the search engine has cached files that you want to be unavailable to be viewed you can go to ( http://www.google.com/remove.html ) and follow the instructions on how to remove your page, or parts of your page, from their database.
Contributed by John Jolly.