Data loss prevention (DLP) products are only as good as their ability to accurately identify and monitor sensitive data, and this begins with the data discovery process. Most organizations store sensitive data in a wide variety of formats and locations. Those include spreadsheets and test documents on network file shares; individual desktop and laptop systems; and databases, application-specific storage and storage areas networks (...
Look for all possible types of formats and storage options in use in your organization when investigating data discovery options in DLP products. For file types, some of the most common include Microsoft Word, Excel, PowerPoint, Adobe PDF files, image files and plain text files. Network locations and types include file shares using the Common Internet File System and Server Message Block protocols, Network File System shares, and databases including Oracle, Microsoft SQL Server, Sybase and MySQL.
Support for scanning storage areas networks using Fibre Channel and iSCSI; Web and FTP servers; and specific content-hosting tools such as Wiki and bulletin board software may be important, as well. As you likely won't know all content types and locations, a safe rule of thumb is "the more, the better."
Another key consideration is flexibility in performing searches. Most tools allow simple or more complex keyword matches, while others allow you to create sophisticated regular expression matches for specific content strings, known pattern-matching for sequences such as credit card numbers and Social Security numbers, and even database-specific queries. Search filters such as time period, document modification date, last user account access and others should be mandatory requirements, as well.
Once data is identified, data loss prevention products should fingerprint, or mark, the data in some way. Common methods for fingerprinting data include the use of cryptographic hash values using MD5 and SHA-1 algorithms, and proprietary tagging or labeling based on file attributes. Tagged data should be categorized based on attributes such as sensitivity and classification levels, location, type (financial and health care, for example) and monitoring frequency. Each tagged file's metadata should then be archived and retained for analysis and comparison in a central repository. As this repository can get very large, be sure to inquire about performance when performing historical searches and real-time comparisons for alerts and potential violations.
Read more on choosing DLP products in our guide.
About the author
Dave Shackleford is founder and principal consultant with Voodoo Security; a SANS analyst, instructor and course author; as well as a GIAC technical director. He has consulted with hundreds of organizations in the areas of security, regulatory compliance, and network architecture and engineering. He is a VMware vExpert and has extensive experience designing and configuring secure virtualized infrastructures, and is the lead author of the SANS Virtualization Security Fundamentals course. He has previously worked as chief security officer for Configuresoft; chief technology officer for the Center for Internet Security; and security architect, analyst and manager for several Fortune 500 companies. Additionally, Dave is the co-author of Hands-On Information Security from Course Technology.