Sergej Khackimullin - Fotolia
Adam Rice and James Ringold
Published: 30 Jan 2015
Organizations that start to address information security in a meaningful way will come to a point in their maturity when they have a lot of machine data. The challenge many CISOs face is how to leverage that data quickly and correlate events dynamically across the enterprise to track down advanced persistent threats (APTs). The Sony Pictures Entertainment hacking incident in November underscores the importance of security monitoring and rapid incident response to clamp down on damages before disaster strikes.
IT security managers cannot protect what they cannot see, and to "see" associations or patterns that can help detect APTs enterprises must have comprehensive logging in place across multiple layers within a network. The greater the visibility, the larger the machine data, and the harder it is for cybersecurity incident response teams to "follow the thread" and correlate security events with threat intelligence in a meaningful way. The answers to many security questions about fraudulent activity, user behavior, communications, security risk and capacity consumption lie within these large data sets.
Why so much logging? Most advanced adversaries gain access to a victim's network via malware, drive-by links or Web shells. Once the initial attack phones home -- malware will initiate outbound connection to C2 hosts to get around inbound firewall rules -- root kits are delivered, and they quickly gain access to a user account and drive around the network as a fully credentialed user. It is difficult to lock down a Microsoft network in any meaningful way without destroying its functionality. A successful strategy to defeat this type of attack includes the following:
- Detect the malware or drive-by links before users click on them. To do this a cybersecurity incident response team has to be able to compare user behavior against threat intelligence. This requires full packet logging of all ingress and egress traffic on an enterprise's edge.
- Detect malware or rootkit delivery to the endpoint. To do this the cybersecurity team needs verbose logging on antimalware and endpoint protection systems.
- The cybersecurity team needs to be able to analyze user behaviors and access across the entire enterprise. Security information and event management (SIEM) tools can alert you to unusual activity, such as account usage during off hours. This is only possible with comprehensive logging of Active Directory (AD) and host access events.
All of this logging can result in close to a million pings a day about potential security events at larger enterprises and terabytes of logging data a month. While comprehensive logging is needed, several factors have to be considered when you increase logging across the enterprise. Infrastructure that is already heavily utilized might experience performance issues with additional logging. The network team should be involved in the design of the logging infrastructure to make sure the aggregation of enterprise-wide logging does not affect performance when all log sources are pointed at a few destinations. It's important to involve key stakeholders in the design and to balance the need for logging with the function of the applications. To see across an enterprise, verbose logging should be enabled throughout as follows:
- Layer 2 switching and choke points on enterprise distribution switches.
- NetFlow enabled and logged where possible.
- Critical services to send access and systems' logs.
- AD to log user behaviors.
- All Internet-exposed devices to log access and system events.
- Endpoint protection systems to log alerts.
- All firewall devices to log inbound access (accepts) and outbound (accepts and denied).
- Other security devices to log alerts and access.
Most security programs begin with logs from the devices at the edge of the network, because those are usually easier to obtain. Firewall, network intrusion detection system and other network-based security products have robust and mature logging capabilities that most companies are already using. The level at which the logging is configured is paramount for visibility into the various APT traffic as it is leaving or entering your environment. This means that if there is an active intrusion, traffic coming and going from the network edge has to be correlated with the suspicious traffic to see the entire communications channel -- malicious actors infiltrating the network, driving a compromised account, and then moving laterally across the enterprise. It's critical to be able to see both successful and denied traffic at the network edge to get a profile of what is normal for your business.
Network connectivity and communications
At the network edge, be sure that your logging doesn't have additional blind spots to traffic that can be used to bypass your security controls. Encrypted traffic, such as SSL/HTTPS, and services that are traditionally used for communication and data transfer, such as IRC and FTP/SFTP/SSH, should also be logged with detail.
Logging of services available to the public Internet is also of great interest, as these systems are the gateways to and from your infrastructure. Any Web server should log not only the connections into the server, but also the actions and input within the applications, so you can understand if they are being used as a bridge to your network. This logging should include not only internally developed Web applications and services but also vendor-provided appliances and applications that reside on those systems. The logging needs to enable you to see what is behind all network communications to and from your environment.
Any security device or system software within your network should also create logs. These security systems usually include, but are not limited to, antivirus or other host intrusion detection software. You can review the host logs on the systems to gain an understanding of the network accounts and computer systems that are used within the scope of the threat. Host firewall logs can be critical to understanding how the threats are moving around within the network after an initial compromise.
Similarly to the host-based firewall logs, NetFlow can help monitor the traffic within your network and identify areas that require further investigation. NetFlow can alert your team to data-transfer activity that is happening within your network that might not be authorized or sensitive information that is being prepared for transmission outside of your network.
Network authentication logs from AD and other LDAP-based services used for central authentication of users and network systems enable you to trace access within your environment and begin to frame up which systems are involved with the threat. Many of the applications and systems in this list will have the capability to send logs off to a centralized system, either through syslog or another facility. Having a central log collection and analysis system is crucial because trying to look in all of these systems, with multiple sources and locations, for the log information is tedious work. This log information will be written to system logs on the hosts, which systems administrators will want to constrain so the data doesn't consume usable system disk space. Security logs kept on systems will usually contain data for a few days at most, and in many situations only a few hours. This is not sufficient time to allow for analysis and review.
Most intrusions are not detected for months after the initial compromise (which may have been the case with Sony). An advanced attack usually goes unnoticed for more than a year, according to a 2012 report by Mandiant, now a division of FireEye, which is consulting on the Sony breach. If log data is not collected and retained during those months, the ability to identify the system of source or persistence is impossible, and the threat may remain within your network for a very long time.
Big data problem
When the cybersecurity incident response team investigates an incident they must be able to follow the thread of events through logged data, and that path is interwoven through the Microsoft domain, security devices, edge devices, switches and routers. During a security event, time is essential in stopping the unauthorized exfiltration of data from a network. From the point of discovery to when an active defense is put in place and the adversary is stopped is a critical time.
To be successful in seeing, stopping and investigating a cyberevent, an enterprise must have the ability to quickly query very large sets of machine data. The notion of having a commercial off-the-shelf tool that has all the answers programmed into its graphical user interface is a fallacy. There is no fixed solution. Queries against large sets of machine data must be dynamic, and results must be presented quickly. For security analysts to be successful, they have to be able to manage big data.
As the number of log sources grows, so does the volume of the log data being collected. This growth never follows a linear path. Each system generates more and more data; and with each system, another system comes into the scope. If all systems and devices are sending logs to a centralized system, which is the ultimate goal, the volume of data quickly becomes unmanageable.
With systems now producing more log data than ever before, and diverse data sources required to search out and locate a threat within the network, a new way to perform data analysis and identify correlated events is needed. The commercial SIEM companies are trying hard to play catch up and positioning their products to support the large volumes of data produced and collected.
Big data analytics must provide the ability to correlate logging events based on time and user behavior across the entire spectrum of devices and technologies in an enterprise. Traditional SIEM tools are not good at this task because they organize data into databases, which become too big and clunky to query across. Typically, the flat files of machine data are best for fast queries. Several network tools designed for this purpose work very well. Splunk Enterprise and IBM QRadar Security Intelligence Platform are examples of big data analytics tools, but organizations need to build an integrated tool set that is designed to complement the security analyst's needs. With these tools and processes come unique skills. The evolving job of the modern security analyst is exactly what the big data problem needs.
With the right tools, a cybersecurity incident response team can follow the thread from a known event, like a malware alert, to behaviors of credentialed user accounts that are compromised, to machines from which the accounts are coming, to active IP sessions on the edge of the network. Without logging, none of this would be possible.
As CISOs build an active defense against the APT, the need to increase logging across the enterprise becomes a critical part of "seeing" and correlating events to track down the bad guys. It is not enough to simply turn on logging across the enterprise: People, tools and processes have to be established to use the data in a meaningful way.
Without a means of leveraging this big data quickly and dynamically, its usefulness disappears. Planning, process and skilled staff are all keys to using the large sets of machine data to win the battle against the APT. Before simply turning up logging across the enterprise, CISOs have to make sure that the budget is in place to acquire the big data analytics tools necessary to correlate events across the data, and that they have the staff with the expertise to use those tools. One without the other is not a workable solution.
About the authors:
Adam Rice is the CISO at Alliant Techsystems (ATK). An InfoSec professional with 17 years of experience, he has served as CSO of a global telecommunications company; general manager and vice president of a managed security services business; and director in several network consulting companies. He is a retired U.S. Army noncommissioned officer and a regular contributor to several information security publications.
James Ringold is a senior enterprise security architect at ATK, who has worked in the aerospace and defense, electronic discovery and investigations and retail industries, performing technical evaluations and building information security programs in various stages. As a security operations manager and incident responder for 17 years, he focused on countermeasures and controls to detect and mitigate cyberintrusions.