Increasing risks of cybercrime and other malicious activity on the Internet is prompting enterprises to deploy more security controls and collect more data than ever before. As a result, advances in big data analytics are now being applied to security monitoring for broader and more in-depth analysis to protect valuable company resources. Called big data security analytics, this technology -- in part -- leverages the scalability of big data and combines it with advanced analytics and security event and incident management systems (SIEM).
Big data security analytics is appropriate for many, but not all, use cases. Consider the challenges of detecting and blocking advanced persistent threat techniques. Attackers who use these techniques may employ slow-paced, low-visibility attack patterns to avoid detection. Conventional logging and monitoring techniques can miss this kind of attack. Steps in the attack may occur on separate devices, over extended periods of time, and appear to be unrelated. Scanning logs and network flows for suspicious activity can sometimes miss key parts of an attacker's kill chain, since they may not vary much from normal activity. One way to avoid missing data is to collect as much information as possible. This is the approach used in big data security analytics platforms.
As the name implies, this approach to security analytics draws on the tools and techniques designed for collecting, analyzing and managing large volumes of data generated at high velocity. These same techniques are used to drive products --ranging from movie recommendation systems for streaming video users, to analysis of vehicle performance characteristics to optimize the efficiency of transportation fleets. They are just as useful when applied to information security.
When evaluating big data security analytics platforms, be sure to consider five factors that are essential to realizing the full benefits of big data analytics:
- Unified data management platform;
- Support for multiple data types, including log, vulnerability and flow;
- Scalable data ingestion;
- Information security-specific analytics tools; and
- Compliance reporting.
Together these features help to provide the breadth of functionality needed to collect large volumes of data at the speed at which they are generated, and to analyze the data fast enough to enable information security professionals to respond effectively to attacks.
Factor #1: Unified data management platform
A unified data management platform is the foundation of a big data security analytics system; the data management platform stores and queries enterprise data. This sounds like a well-known and solved problem, and which should not be a distinguishing characteristic, but it is. Working with large volumes of data typically requires distributed databases, as relational databases do not scale as cost-efficiently as distributed NoSQL databases -- such as Cassandra and Accumulo. The scalability of NoSQL databases, meanwhile, comes with its own drawbacks. For example, it is difficult to implement distributed versions of some features of databases that we might take for granted, such as ACID transactions.
The data management platform underlying a big data security analytics product has to balance data management features with cost and scalability. The database should demonstrate an ability to write new data in real-time without blocking on writes. Similarly, queries should execute fast enough to support real-time analysis of incoming security data.
Another important aspect of a unified data management platform to consider is data integration.
Factor #2: Support for multiple data types
Big data is often described in terms of volume, velocity and variety. The variety of security event data presents a number of challenges to data integration.
Event data is collected at different levels of granularity. For example, network packets are low-level, fine-grained data, while log entries about a change to an administrator password on a server are rather coarse-grained. In spite of the obvious difference, they could be linked, however. Network packets could capture data about the attacker's method for reaching a targeted server and -- once gaining access to it -- could change the administrator password.
The semantics of event data vary across data types. Network packet information helps analysts understand what data was transmitted between two endpoints, while the log of a vulnerability scan describes, to some degree, the state of a server or other device over an extended period of time. Big data security analytics platforms need sufficient information about the semantics of different data types to adequately integrate them.
Factor #3: Scalable data ingestion
Servers, endpoints, networks and other infrastructure components are constantly changing states. Many of these state changes log useful information that should be transmitted to a big data security analytics platform. Assuming the network has sufficient bandwidth, the biggest risk is that the data ingestion component of the security analytics platform cannot keep up with incoming data. If that were the case, data could be lost, undermining the purpose of deploying a big data security analytics platform.
Systems can accommodate scalable data ingestion by maintaining high write throughput of queuing data in a message queue. Some databases, meanwhile, are designed to support high-volume writes by using an append-only approach to writes. Data is appended to the end of a commit log instead of writing to an arbitrary block on the disk. This reduces the latency associated with random writes to magnetic disks. Alternatively, the data management system may maintain a queue that acts as a buffer to hold data while it is written to disk. If there is a spike in messages or a hardware failure that is slowing write operations, data can accumulate in the queue until the database can clear the backlog of writes.
Factor #4: Security analytic tools
Big data platforms, such as Hadoop and Spark, are general-purpose tools. While they are useful for building security tools, they are not in themselves security analytics tools. Analytics tools should scale to meet the size of data generated in an organization's infrastructure. In this way, tools like Hadoop and Spark meet the criteria. In addition, however, security analytics tools should account for relations between different data types, such as users, servers and networks.
Analysts should be able to query event data at a level of abstraction that makes sense from an infosec perspective. For example, an analyst should be able to query about links between users working with particular servers and applications and the links between them. This kind of querying requires more graph-like analytics tools rather than the conventional column and row queries used with relational databases.
Factor #5: Compliance reporting
Compliance reporting is no longer a "nice to have" requirement. Many of the data elements reported for compliance purposes are tied to security best practices. Even in cases where companies do not have to maintain compliance reports, they can be useful for internal oversight.
When compliance reporting is required, review the reporting regimes included with various big data security platforms to ensure the needs of your business are met.
Effective deployment of big data security analytics platforms
Big data security analytics leverages the scalability of big data platforms with the analysis capabilities of security analytics and SIEM tools and more. It is important to recognize that features of both, as well as the five factors described in this article are needed for the effective deployment of big data security analytics platforms. Simply rebranding a big data platform with the word "security" or insisting that a SIEM can handle big data scales -- even if it was not designed from the ground up for that purpose -- in no way makes it a true big data security analytics platform.
In part one of this series, find out about the basics of big data security analytics in the enterprise
In part two of this series, learn about the enterprise uses cases for big data security analytics platforms