How to use (almost) free tools to find sensitive data

No matter how much security awareness training employees get, some of them will still store sensitive data in insecure places. As a security manager, finding that data becomes of paramount importance -- but how to do it? In this tip, John Soltys offers advice on ways to find insecurely stored data.

There are lots of high-end products that will automate the process of data discovery, but it doesn't take much effort to hunt down data yourself.

You've got sensitive data. Every organization does. And your users need it. If they didn't need it, why would you have it?(That's a different topic, but assume you keep it around for a good reason.)

Unfortunately, credit card and Social Security numbers can be stored almost anywhere, and they usually are. Sure, security managers can educate users with awareness programs that warn about the dangers of insecure storage. They can provide secure locations for the data while employees are working with it, and even create encrypted connections to move the data back and forth. In many cases, applications are specially built to house such data.

Users, however, will still build spreadsheets full of sensitive data and store them on the file server or right on their laptops. Data can even exist locally for accidental reasons, such as a critical table in a database backed up before an upgrade and never purged, even though the data in it could lead to a hefty fine for the company if it's ever compromised.

So what to do? Go hunting for the data, of course.

There are lots of high-end products that will automate the process of data discovery, but it doesn't take much effort to hunt down data yourself. As a bonus, finding it yourself avoids budget fights and saves your political capital for the purging effort that follows discovery.

Don't miss need-to-know info!

Security pros can't afford to be the last to know. Sign up for email updates from SearchSecurity.com and you'll never be behind the curve!

Depending on where users might have stored the data, there are different discovery tools. For Windows systems, Nessus, a network vulnerability scanner, has a Windows File Contents Compliance Check plug-in that can be customized to find specified types of data. Nessus also provides pre-made audit files for common types of sensitive data, such as credit card numbers, Social Security numbers and drivers' license numbers, all of which are covered by most of the state breach-notification mandates.

Provide credentials with access to the file system (usually a domain admin) and Nessus will identify those systems that fail the compliance check; those that contain data that matched one of the patterns. Nessus can be configured to show the location of the data and also mask what it found so the data is not exposed in yet another location. Looking at all the content on a Windows box can take a long time, so consider segmenting the search by network.

Nessus is also inexpensive. $1,200 a year includes the audit files and all the vulnerability assessment functionality that made Nessus such a valuable tool in the first place. (A free version of Nessus is available, but only the "ProfessionalFeed" includes the sensitive content plug-ins and the pre-built audit files for detecting sensitive data. While it would be possible to create your own audit files with patterns that match SSNs, credit cards, etc., they won't work without the "Professional Feed.")

Searching non-Windows environments -- Unix, Linux and Mac OS X -- requires a little more manual work, as the Nessus File Contents Compliance Check plug-ins work only on Windows file systems. Luckily, there are usually fewer of these systems to look through.

Grep is a great tool for this purpose: the powerful command-line tool is built into many operating systems. Each OS may have a slightly different version of grep, so check the syntax for compatibility. Begin by creating a file full of patterns you want to find, such as: [0-9]{3}-[0-9]{2}-[0-9]{4} for SSNs. Then point grep at the pattern file and a directory. Something like this works on RH Linux and Mac OS X:

grep -cEHilrs -f patterns /directory/to/search

Your best option is to pipe the output to a file in order to continue investigations later.

Keep in mind that grep is not likely to find data in all binary files, but it can search text files and look into files created by applications like Microsoft Word.

Finally, think about your databases. Over time these seem to multiply. Data becomes misplaced when an application is decommissioned or the principle user leaves the company.

Happily, it just takes a little script (with proper credentials, of course) to discover what sort of data resides in databases you may not be familiar with. The script can be written in whichever language you're most comfortable with, as long as it can connect to the database. It needs to connect, get a list of the tables, extract the first few and last few rows of each table, and write that to a secured file.

The following is an example in pseudo code:

for each database in list_of_databases
connect to database
get list_of_tables

for each table in list_of_tables
get first five rows
get last five rows
write to output file

For more information
Watch this screencast to learn more about finding host-level data with Network Miner.

Write Wireshart network traffic filters with this expert advice.

Then use either Nessus or grep to search the output of the script for sensitive data patterns.

This approach is far from comprehensive, but can turn up many of the locations where users are storing sensitive data. Also, don't forget to record everything so you can use it to justify a more robust data-search product if needed in the future.

About the author:
John Soltys, GSNA, GCIH runs the information security program for The Seattle Times Co. in Seattle, Washington. When he's not securing data, he's hiking and climbing in the Cascades with his family.


This was last published in March 2009

Dig Deeper on Data loss prevention technology