This content is part of the Essential Guide: Big data tutorial: Everything you need to know

Securing big data: Architecture tips for building security in

Expert Matt Pascucci advises a reader on securing big data with tips for building security into enterprise big data architectures.

There’s a lot of hype about “big data” and Hadoop in our organization right now. What are the key network security considerations we should think about with Hadoop network design and deployment?

Ask the Expert!

Have questions about network security for expert Matt Pascucci? Send them via email today! (All questions are anonymous.)

Since “big data” is a hot topic these days, there’s no question an increasing number of enterprise infosec teams are going to be asked about the security-related ramifications of big data projects. There are many issues to look into, but here are a few tips for making big data security efforts more secure during architecture and implementation phases:

  1. Create data controls as close to the data as possible, since much of this data isn’t “owned” by the security team. The risk of having big data traversing your network is that you have large amounts of confidential data – such as credit card data, Social Security numbers, personally identifiable information (PII), etc. -- that’s residing in new places and being used in new ways. Also, you’re usually not going to see terabytes of data siphoned from an organization, but the search for patterns to find the content in these databases is something to be concerned about. Keep the security as close to the data as possible and don’t rely on firewalls, IPS, DLP or other systems to protect the data. 
  2. Verify that sensitive fields are indeed protected by using encryption so when the data is analyzed, manipulated or sent to other areas of the organization, you’re limiting risk of exposure. All sensitive information needs to be encrypted once you have control over it.
  3. After you’ve made the move to encrypt data, the next logical step is to concern yourself with key management. There are a few new ways to perform key management, including creating keys on an as-needed basis so you don’t have to store them.
  4. In Hadoop designs, review the HDFS permissions of the cluster and verify all access to HDFS is authenticated. When first implemented, Hadoop frameworks were notoriously bad at performing authentication of users and services. This allows users to impersonate as a user the cluster services itself. You can be authenticated to the Hadoop framework using Kerberos, which can be used with HDFS access tokens to authenticate to the name node.

There are many other areas of security in big data systems, like Hadoop,  but when securing big data, authentication, encryption and permissions are three of the largest areas of concern during the big data architecture phase. As with most IT projects, building security in from the beginning is always better than trying to add security later.

This was last published in June 2012

Dig Deeper on Data security technology and strategy