Grafvision - Fotolia
- Robert Richardson, Editorial Director
Now that the big data storm has subsided, it’s time to move beyond the hype and really look at how machine data might provide actual value to enterprise security. One company that’s worth watching is Sqrrl Data Inc. The big data analytics provider announced in February that its Sqrrl Enterprise 2.0 offers “a full-stack security analytics solution for detecting and responding to advanced cybersecurity threats.”
The company, headquartered in Cambridge, Mass., entered the market two years ago with a souped-up analysis tool that sits atop a Hadoop installation. The open source framework, hosted by the Apache Foundation, enables large organizations to build a distributed file system across server farms and quickly process searches and queries (programmatically) against large data sets. Enterprise-level analysis tools have been sorely lacking: Now Sqrrl is essentially doubling down on security as a space where a link analysis approach to data yields particularly fruitful results.
The founders of Sqrrl are ex-NSA data crunchers, part of a recent diaspora that has yielded several startups, including Area 1 Security, Synack and Morta Security, which was acquired by Palo Alto Networks in January. In this case, key founders of Sqrrl were involved in the development of Accumulo, a distributed key/value data store designed to handle the huge amounts of data sorted and sniffed by that ever-curious intelligence entity in Fort Meade. Roughly speaking, all those metadata-snarfing, top-secret programs that NSA whistleblower Edward Snowden exposed used Accumulo as their repository. The NSA released Accumulo as an Apache open-source project in 2011.
Accumulo's roots were in Google’s BigTable project, which was highly influential in producing the current buildout of NoSQL databases. Among databases of its particular flavor of NoSQL (wide-columnar), Accumulo is the third-most deployed, after Cassandra and HBase. Accumulo uses a Hadoop distributed file system, which enables the processing of enormous data sets. Because three copies of the dataset are stored, an entry-level Hadoop cluster is often comprised of three standard Linux servers storing up to 10 TB each. If you’ve got more than 10 TB of data, you add servers.
The inherent advantage of a wide-columnar store is that it can handle a lot more data a lot faster than the relational databases found in security information and event management (SIEM) deployments. “Traditional SIEM systems have had a hard time keeping more than 30 or 60 or maybe even 90 days’ worth of data, because all these systems were developed in pre-Hadoop days,” said Ely Kahn, Sqrrl’s vice president for business development. “But we can take petabytes of data and not only store it cost-effectively but also search and query it at near real-time speeds.”
Fortune 20 companies are “literally looking at trillions of unique nodes,” Kahn said. Although Sqrrl’s software knows how to look for anomalies, he added, it also allows security analysts to be more efficient with their time, which is probably more important.
The approach Sqrrl takes in organizing and processing a year’s worth of all kinds of machine and network log data is based on visualizations of the links -- or associations -- between elements. “Google is probably the purveyor of the most popular graph algorithm in the world,” Kahn noted, “which is, of course, called PageRank. A lot of the way that Google does its searches on its semi-structured data is using graph algorithms that look at the links between Web pages to determine the importance [rank] of Web pages.”
Right now, it’s primarily the largest enterprises whose security analysts can take advantage of Sqrrl and link analysis. But Sqrrl isn’t the only game in town -- both Palentir Technologies Inc. and RSA offer Hadoop-based analytics of one kind or another, and it seems possible that competition may drive pricing down to levels more palatable for midsize organizations -- all the more so given that Hadoop clusters are generally built out of low-cost, standard rack-mount servers. The Securities and Exchange Commission hired Palentir in 2014 to help it detect insider trading through data analysis of investors’ transactions prior to mergers and acquisitions and other deals related to public companies. In the meantime, if you want to bring spy agency technology to bear on your attackers and you’ve got a global enterprise reach, you’ll want to give these tools a look.
Robert Richardson is the editorial director of TechTarget’s Security Media Group. Follow him on Twitter @cryptorobert.
Dig Deeper on Data security technology and strategy
Apache Rya matures open source triple store database
AWS snaps up Sqrrl to strengthen threat detection, analytics
Up-and-coming data engineers complement entrenched data scientists
Hadoop starts to trumpet way through UK public sector