Problem solve Get help with specific problems with your technologies, process and projects.

Effective DLP products need data discovery and data fingerprinting

Effective DLP products must be able to handle data discovery to identify and monitor sensitive data. Learn why these features matter.

Data loss prevention (DLP) products are only as good as their ability to accurately identify and monitor sensitive...

data, and this begins with the data discovery process. Most organizations store sensitive data in a wide variety of formats and locations. Those include spreadsheets and test documents on network file shares; individual desktop and laptop systems; and databases, application-specific storage and storage areas networks (SANs).

DLP products are only as good as their ability to accurately identify and monitor sensitive data, and this begins with the data discovery process.

Look for all possible types of formats and storage options in use in your organization when investigating data discovery options in DLP products. For file types, some of the most common include Microsoft Word, Excel, PowerPoint, Adobe PDF files, image files and plain text files. Network locations and types include file shares using the Common Internet File System and Server Message Block protocols, Network File System shares, and databases including Oracle, Microsoft SQL Server, Sybase and MySQL.

Support for scanning storage areas networks using Fibre Channel and iSCSI; Web and FTP servers; and specific content-hosting tools such as Wiki and bulletin board software may be important, as well. As you likely won't know all content types and locations, a safe rule of thumb is "the more, the better."

Another key consideration is flexibility in performing searches. Most tools allow simple or more complex keyword matches, while others allow you to create sophisticated regular expression matches for specific content strings, known pattern-matching for sequences such as credit card numbers and Social Security numbers, and even database-specific queries. Search filters such as time period, document modification date, last user account access and others should be mandatory requirements, as well.

Once data is identified, data loss prevention products should fingerprint, or mark, the data in some way. Common methods for fingerprinting data include the use of cryptographic hash values using MD5 and SHA-1 algorithms, and proprietary tagging or labeling based on file attributes. Tagged data should be categorized based on attributes such as sensitivity and classification levels, location, type (financial and health care, for example) and monitoring frequency. Each tagged file's metadata should then be archived and retained for analysis and comparison in a central repository. As this repository can get very large, be sure to inquire about performance when performing historical searches and real-time comparisons for alerts and potential violations.

Read more on choosing DLP products in our guide.

Next Steps

Best data loss prevention products

Deploying DLP technology requires hands-on approach

Four DLP best practices for success

This was last published in April 2013

Dig Deeper on Data loss prevention technology