RSA 2014: News, analysis and video from RSA Conference 2014
A comprehensive collection of articles, videos and more, hand-picked by our editors
SAN FRANCISCO -- Security vendors often claim that attackers behave in unpredictable ways, but two of the researchers behind Verizon's annual Data Breach Investigations Report (DBIR) believe vendors can't spot emerging attack trends because they don't use modern data analysis techniques.
We see evidence of these patterns, and if we see these patterns, there is some hope for us.
managing principal for research and intelligence, Verizon
During a session at the 2014 RSA Conference, Wade Baker, managing principal for research and intelligence at Verizon, and Jay Jacobs, senior data analyst for Verizon, took on what they consider to be the poor security data analysis practices plaguing industry research efforts. Attacks on enterprises do largely follow some a small number of techniques, the pair noted, which can often be identified by asking the right research questions and using the right methods to scan the resulting data.
To prove their point, Baker and Jacobs utilized a variety of data analysis techniques to digest reports on 5,553 nonpublic data breaches from across nine industries, a sampling they considered to be "small data."
First, Jacobs showed a chart that simply measured the number of times an attack method was mentioned among the incident reports. A few attack techniques formed huge spikes at one end of the chart. Jacobs noted that certain attack techniques tend to be utilized in pairings, with some showing up as severe peaks and others registering as only a blip on the radar.
Building on those rudimentary findings, Baker utilized cluster analysis, essentially the grouping of similar data objects within a set, to show how certain attack techniques can be linked to hackers' motives. By looking for characteristics as measured by the VERIS framework, the Verizon researchers discovered that attackers engaged in cyberespionage, for example, will most likely utilize phishing, rootkits and malware, among other exploit methods.
Machine learning was able to add even more clarity to those findings, according to Baker. By allowing a computer to "explore patterns" in the data set, Baker estimated that between 85% and 90% of all the examined breaches fell into one of nine attack patterns, including insider threats, point-of-sale attacks and Web application threats.
"If you were going to tell me about 5,000 breaches, I'd never imagine you could describe 85% to 90% of them in nine or so patterns," Baker said. "There's a reason attackers do what they do and why they choose the methods they do. We see evidence of these patterns, and if we see these patterns, there is some hope for us."
The pair noted that the ultimate goal of their research is to enable organizations to alter their security controls based on the attack types each individual company is likely to face, the industry vertical in which it falls and even the motivation of the attackers targeting it.
For example, by using Pearson's chi-squared test on the breach data, Jacobs showed how certain industry verticals were far more likely to see certain attack types. Enterprises in retail and finance face numerous denial-of-service attacks, according to the pair's analysis, whereas data theft and loss were more likely to affect healthcare organizations.
Baker said that such findings could have important ramifications for how organizations form their security strategies. Standards such as ISO 27001 and COBIT 5 provide security teams with a list of good security controls, Baker commented, but without any real way to prioritize those controls based on specific industry attack trends, an organization in healthcare, for example, would have no way of knowing which controls it should prioritize versus a retailer.
"We're seeing these industries are in fact different," Jacobs said, "and through this work, we'll start to be able to prioritize controls."
Of course, to utilize modern data analysis techniques in enterprise security, organizations are first going to need access to the right data. The Verizon duo opined that this is one area where the industry consistently fails itself, particularly with the inability of vendors to ask good research questions and conduct analysis based on those questions.
To validate their claims, Baker and Jacobs performed a metadata analysis on 50 of the most notable security vendor reports produced within the last year, essentially scanning for how often certain words or phrases surfaced. Among the reports in the analysis were FireEye's Advanced Threat Report, Alert Logic's State of Cloud Security Report and even Verizon's own 2013 Data Breach Investigations Report.
While they found dozens of mentions of words like percent and distribution, even a basic statistical term like mean cropped up only a handful of times. References to the lexicon associated with data mining and machine learning, two of the more advanced methods they implored the industry to utilize, were basically nonexistent.
The pair noted that many of the vendor reports in the security industry simply don't seem to be "research-driven," and Jacobs highlighted a particular report produced by the Ponemon Institute that claimed "it didn't have enough data to calculate a sampling error." They said many reports simply rely on a single vendor's perspective, so if a particular company has access to network sensors through its products, it will simply use the numbers gained from those sensors, producing what Baker described as "convenient results."
More from RSA
Quickly access all of SearchSecurity's coverage of the information security industry's premiere event.
In the case of the DBIR, the pair emphasized that Verizon collects information from many outside resources in addition to its own data, which they claimed allows Verizon's report to paint a much clearer picture.
Baker said the security industry needs to show more curiosity and perhaps even take inspiration from someone like Bill James, a statistical guru who used advanced data analysis to evaluate baseball players, eventually changing the paradigm professional baseball teams use to measure player performance.
"It is extremely common in our industry to state something and then treat it as a stone-cold fact, and we don't want to expose biases or flaws in our data because we don't want to look stupid," Baker said. "We're not seeing much statistics being used, and we think we need to get out of the 1800s."
An RSA Conference attendee who requested to be referred to only as Deb said she was drawn in to Verizon pair's presentation due to their work on the DBIR, one of the few industry reports she uses as a reference point in her own security organization. Deb also noted experiencing some of the same issues with vendor reports that were mentioned by Baker and Jacobs, especially in regard to Ponemon research that often uses a sample "too small to be meaningful."
She said she found many of the data analysis techniques covered by the Verizon presenters to be compelling, though admitted she is simply too busy deploy them in her own work environment. Deb said she'll be on the lookout for either Verizon or an organization like the SANS Institute to translate Baker's and Jacobs' work into a more easily digestible form.
"If you have to be PCI-compliant or face other regulatory compliance issues, you're probably already covering [the SANS top 20 and similar lists of controls]," Deb said. "If you get everyone to apply [those measures for compliance], no one really has time to go beyond that unless it's warranted."