Back in 2006, AOL released search history of about 20 million searches from 650,000 users, freely available on the Web, available at about half a dozen mirrors now. Their goal was to make the search information available to researchers, but they didn't properly consider the invasion of privacy that such a release entails. AOL tried to anonymize the information, passing it through software that remapped each particular user's identity into another value before releasing it publicly. Thus, for a hypothetical example, you can't tell that user Fred Smith did a search for "Fred Smith" and later searched for "halitosis." However, even with this remapping, a person can tell that a given user's anonymous number still performed both searches, implying pretty heavily that good old Fred suffers from bad breath.
But, that's personal information. To get to the point of your question, how does this impact corporate data security and intellectual property? Enterprise employees, especially those associated with some of the most important intellectual property assets of a company, frequently research new applications of their products, new markets they are considering entering, the competitors' products, potential mergers and acquisition targets, and so on. Imagine looking at the search engine history for all IP addresses associated with some large company and sorting them out by users differentiated by the cookie left on their browsers by the search engine. Surely, some very sensitive information about the organization's plans would be revealed.
Because of this concern about the sensitivity of search results, Google announced in March 2007 that they would anonymize search results after 18 to 24 months. That's better than keeping all search queries around forever, but it's a pretty long time. Also, even after that timeframe, Google doesn't delete user searches; it merely anonymizes them. Google has said that this anonymization process involves dropping some of the bits of a user's IP address as well as changing the cookie value, but details are murky.
To address this issue, other search engine companies have jumped on board the privacy bandwagon, offering users an option to avoid storing search history on their servers entirely. In July 2007, Ask.com announced their AskEraser feature, which allows users to configure the Ask.com search engine to not log any search history on their servers. By default, Ask.com logs search queries for 18 months. To change this, when accessing Ask.com, simply click on the "AskEraser" link near the top of their page. A message pops up asking if you want to turn on AskEraser. The service is pretty easy to use, and it's a helpful option for those people who desire more anonymity. While Ask.com hasn't revealed the detailed technical underpinnings of how they omit or destroy search history on their servers, such functionality is certainly possible.
Please note that the discussion above is associated with the search history stored on the search engine company's own servers. Even with AskEraser and Google's 18 to 24 month anonymizing process, browsers still maintain a browsing history that includes all recent searches -- completely independent of what the search engine itself does with that information.
This was first published in March 2008