It's become increasingly clear that organizational data is always at risk, especially from insiders. That's not news to anyone paying attention to information security news headlines, but most organizations have played their own version of the ostrich game (stick your head in the sand and hope the problem goes away) because the processes and technologies to track data leakage were immature, and that's being kind.
Ultimately, bad guys predominately want personally identifiable information (PII) -- examples include mailing addresses, Social Security numbers and credit card information -- which can be used as a mechanism for identity theft. Unfortunately, there is a huge market for information that can be used not only to loot accounts, but also to obtain credit in the name of someone else. TJX Companies Inc. is the king of the Rogues gallery, illuminating the potential consequences of private data loss.
Another big source of data leakage is intellectual property (IP). We are all familiar with the theft of trade secrets at DuPont Co., but there are hundreds of other transgressions that go unnoticed because most organizations want to keep them quiet. All businesses have a significant portion of their intellectual property digitized. So at any given time, a malicious or unsuspecting employee can download information to removable media or attach it to an email message and BAM! Your data is gone.
Since the stakes are clear, what can we do to protect "outbound" content? It starts with a multifaceted approach that includes training, i.e. reminding employees of the organization's policies and the consequences of not complying.
Yet, the first step is not even training; it's figuring out what you are trying to protect. That's means finding and surveying the data in your organization and defining who can use what and why. Simply locating data is helpful because the organization process can often help eliminate a lot of leakage points that would otherwise go unnoticed.
OK, so now that the sensitive data has been located and it's been determined what needs special protection, next comes technology. There are many techniques to identify sensitive data before it escapes, and the difference between an effective product and the perception of failure is accuracy. Too many false positives, meaning you flag data that isn't actually a violation, and you are mud. Miss something, meaning data leaks out, and you are mud. The objective is to not be covered in mud, since both scenarios waste a lot of time and money and don't stop the data leaks. Here is just a smattering (certainly not comprehensive) of the available techniques:
- Regular expressions -- RegEx tends to be the most simplistic of detection techniques. This method involves merely looking for data formatted as a Social Security number, phone number, account number, etc. However, this technique is easy for a malicious attacker to get around by changing the format of the data stream.
- Dictionaries -- There are also words commonly used, especially in healthcare, like diagnosis codes. Gateway products use a dictionary to pinpoint sensitive data that should at least be investigated.
- Fingerprinting -- Many vendors use sophisticated algorithms in their devices to distill what an organization's sensitive data should look like. These products look at what is considered sensitive data, develop a fingerprint of the data, and look for other data types that resemble the sensitive data.
- Heuristics -- There are other techniques used primarily in the antispam business that are applicable to outbound content filtering, such as using heuristics to train the device about what is good and what is bad. This is similar to fingerprinting, but less sophisticated.
- Proximity matching -- A way to increase accuracy is to apply a proximity-matching formula to the data that not only looks for certain words, but also how these words are used relative to each other. This method helps to identify sender intent, as opposed to just sending up a flag every time a certain type of data is identified.
The reality is that every vendor and/or technology will likely use all of these techniques in some way and probably quite a few others. To further complicate things, they are going to use different language to describe the same methods, meaning it will be an inexact science to determine what product will provide the best outbound content filtering for your environment. The only way to figure that out is to actually test a few (meaning 1-3, not five or 10) of the devices on your actual traffic.
Yes, this approach is resource-intensive and eats up time that you probably don't have, but can you really compromise on your accuracy? Not a chance. So get it right the first time, or the auditors will be making sure your successor does.
Finally, determine whether a stand-alone device makes more sense or whether this content security capability should be integrated into another device, like an email gateway or a unified threat management (UTM) device. The answer is both; the decision points are really more about politics, scale and complexity within your organization rather than anything from a technology standpoint.
If you've determined that 95% of your organization's risk is derived from potential email message leaks, then using the outbound filtering capabilities within your existing email security device will suffice. If the company has a lot of complicated CAD/CAM drawings or drug compounds, then it only makes sense to use a product that specifically tracks data not only on file shares and in databases, but also on endpoint devices.
Don't neglect the ramifications of scale as well. In relatively small environments, the Web- and email-filtering capabilities of existing perimeter gateways should suffice. But in mega-enterprises -- where egress points are numerous and geographically distributed and 1 GB networks are yesterday's news -- a dedicated outbound content security platform is probably a more suitable option.
Looking ahead, outbound content filtering and leak prevention technology may become a feature of perimeter platforms and endpoint security suites, but it's not clear that will make things easier, especially in large enterprises. Why? It's all about a consistent policy. If different technologies protect different aspects of an environment, then it's hard to enforce a consistent policy.
Similar to pretty much every other problem we are trying to solve in security, there is no silver bullet or a generic solution. You can't just go down to the corner store and pick up a content security thingy. The best answer today is adopting a process that can help determine where data lives, what needs protection, and finally which technology mix is best for your organization's needs.
About the author
Mike Rothman is president and principal analyst of Security Incite, an industry analyst firm in Atlanta, and the author of The Pragmatic CSO: 12 Steps to Being a Security Master. Rothman is also SearchSecurity.com's expert-in-residence on information security management. Get more information about the Pragmatic CSO at http://www.pragmaticcso.com, read his blog at http://blog.securityincite.com, or reach him via e-mail at mike.rothman (at) securityincite (dot) com.