Big data security analytics systems based on NoSQL databases and Hadoop processing systems are taking hold in a growing number of enterprises. While to date, big data systems have largely prioritized performance, experts say NoSQL database security is increasingly important, and the technology to support security for these nascent systems is poised to take a big step forward.
A NoSQL (Not only SQL) database is most often defined by what it is not, specifically that it is not a relational database. A relational database stores structured data on a single machine, but NoSQL databases are designed to store massive amounts of unstructured and semistructured data across clusters of machines.
The design of NoSQL databases has been optimized for low latency response and high performance, while still being scalable, making them perfect to support a security big data analytics system. Use of security big data systems is on the rise: Nemertes Research Group Inc. in a survey last year found that nearly a quarter of large organizations already use big data for security, while another 14% planned to implement security big data capabilities by the end of 2014.
"Relational databases tend to be monolithic, because they are designed for vertical scale," said Tyler Hannan, director of technical marketing at Washington, D.C.-based NoSQL vendor Basho Technologies Inc. "This means you only have one thing to secure. In NoSQL, you have to secure each node separately."
However, because of the differences in structure, scale across machines and variety in types of NoSQL databases, experts say there are unique security challenges which have not been fully addressed with the current security offerings. Alexander Rothacker, lead security researcher at Chicago-based security vendor Trustwave Inc., noted that beyond these differences, security for NoSQL databases is a fragmented space. This has resulted in part because there are four broad types of NoSQL databases -- document, key-value, wide column store and graph, exemplified by MongoDB, Riak, Cassandra and Neo4J, respectively -- and because of the multiple security projects that have aimed to add security layers to Hadoop, the open source processing framework for NoSQL databases.
Despite the differences and unique challenges, experts and vendors agreed on the core NoSQL database security aspects that enterprises must address: authentication (user identification), access control, encryption and auditing.
NoSQL authentication and access
The biggest difference between relational databases and NoSQL databases, and a major point of emphasis for vendors, is in authentication and access controls.
According to Ely Khan, vice president of development at Cambridge, Mass.-based Sqrrl Data Inc., access control is important because of the distributed nature of NoSQL databases, while others like Josh Shaul, vice president of product management at Trustwave, noted that the most likely attack vector for NoSQL databases is through an application, making access control the main protection against a breach.
Many NoSQL databases use Kerberos for authentication, which Rothacker said offers limited role-based access, but the problem is that Kerberos often isn't enabled. One of the major aims of many NoSQL security projects is to offer more fine-grained access controls, said Kahn, which often means setting up role-based or attribute-based authentication.
"Role-based access is easier for system administrators and more straightforward," Khan said. "Attribute-based access is more complex because it sets parameters based on attributes like location or time."
Fremont, Calif.-based Data guise Inc. recently released a product called DgSecure, which adds more fine-grained access controls and encryption to Cassandra NoSQL databases. Similarly, Hadoop security products like Apache Ranger and Apache Sentry, backed by Santa Clara, Calif.-based HortonWorks Inc. and Palo Alto, Calif.-based Cloudera Inc., respectively, also aim to add more granular access controls, said Gulatieri, and each is going about the task from a different angle.
"Apache Ranger is all about fine-grained control to the services of Hadoop, including Apache Hive, and reading and writing to HDFS," Gualtieri said. "Apache Sentry is about fine-grained access to the stored data. So you may have access to Hive, but Sentry will prevent Hive from seeing the data, while Ranger will prevent you from accessing Hive."
Another important element for successful NoSQL authentication and access controls, according to Cloudera Director of Product Management Sam Heywood, is integration with existing authentication technologies, like Microsoft Active Directory, and the policies and procedures already set up in those products. This is why Apache Sentry was built to hook into existing Active Directory implementations.
Once a database has authentication and access controls set up, Gualtieri said the next key is confidentiality, which makes data available only to those users intended to see or use it, and this is implemented through encryption.
Rothacker said encryption is a weak point in the current collection of NoSQL security products, though his colleague Shaul questioned how much value encryption brings. Shaul said that organizations need to encrypt data for regulatory compliance, but that database encryption is more of an access control feature.
"The risks around NoSQL databases are mostly vulnerabilities that allow an attacker to bypass access controls," Shaul said. "Once an adversary bypasses access controls, the application will decrypt data anyway. Ultimately, encryption only keeps out database admins, but that leads to a paradox, because you need database admins to manage encryption."
According to multiple experts, including Adrian Lane, security analyst and CTO of Phoenix-based security firm Securosis LLC, the benefits of NoSQL encryption must be weighed against the implementation cost and effect on performance.
"The impetus is in performance and scalability," Lane said. "Application-layer encryption causes major performance hits. Transparent encryption scales well as you add nodes. It doesn't have performance hits, but it's not quite as secure."
Dataguise also noted the performance concerns, and said that if a database were encrypted at the cell level, a simple query could potentially require millions or even billions of encryption or decryption operations. In deference to both the access control ideas and performance concerns, Dataguise designed its DgSecure to encrypt sensitive data only if a user is not authorized to see that data.
Intel Corp. has also tried to ingrain itself in Hadoop security, said Gualtieri, by backing the Apache Rhino project, which essentially optimizes NoSQL encryption on Intel microprocessor chips to make encryption processes faster and easier.
NoSQL databases are distributed across clusters of machines, Hannan noted, which means that in addition to encryption of data at rest, there needs to be consideration for data in motion.
"In a distributed system, servers talk to each other to make sure data is replicated properly," Hannan said. "So, intracluster communication should be encrypted with TLS, and should be encrypted through the wire."
Gualtieri noted that encrypting data in a NoSQL database isn't a trivial proposition, because one of the main reasons to use NoSQL over traditional relational databases is the ability to perform big data analytics.
"Encryption can scramble things and cause problems with analytics," Gualtieri said. "You need to use data masking to be able to protect the confidentiality of the data while still allowing for analytics to be performed without decrypting the data."
Integrity and auditing
Gualtieri said that in addition to authentication, authorization and encryption, NoSQL-based big data analytics security systems need integrity, meaning only authorized users can be allowed to change data. Also important is availability, which allows the application to perform as needed and also makes sure hackers can't lock up the system.
The last key property, according to Gualtieri, was nonrepudiation, which is an expanded idea of auditing in which the system logs not only what data was accessed, when and by whom, but also what changes were made to the data. Rothecker said the importance of auditing came from the assumption that a data breach is inevitable.
"You need to audit access to the systems," Rothecker said, "so even if you can't prevent the break-in, you have the forensic data to figure out what happened after the fact."
Auditing was another point where experts saw room for maturation with the current security projects for NoSQL.
"A log can show that a user went in," said Gualtieri, "but it may not show what they did. Auditing comes in different forms, and the current security projects are all trying to get more fine-grained here as well."
Ultimately, Gualtieri and others noted that the improvement of authentication and access controls will remain the most important NoSQL database security features as products mature. Gualtieri said that while it remains unclear just how those capabilities will develop, he is confident that features beyond access control will mature quickly.
"The time horizon for things to get better is this year," said Gualtieri, because a growing number of enterprises are implementing Hadoop-based big data security analytics systems. "A couple years ago this was all experimental, but now there's a huge market demand for this."
Learn more about the best practices for big data security.
Given the advance in security analytics tools, one industry observer says enterprises are out of excuses not to use cybersecurity analytics.