Information Security

Defending the digital infrastructure


News Stay informed about the latest enterprise technology news and product updates.

New measures for security metrics: Ranum Q&A with Jay Jacobs

Wading into the murky waters of security metrics? Jay Jacobs offers his take on data collection and incident reporting with the VERIS framework.

Information security metrics abound, but few reports garner the attention awarded Verizon's Data Breach Investigations Report. The 2103 DBIR, which highlighted China's alleged cyberespionage among other significant breaches, was based on data pooled from 19 organizations worldwide.

Marcus Ranum had a bone to pick with one of the "top external actors" charts, fueled by a healthy skepticism he attributes to his college days in statistics class. "[T]hose lectures had the effect of making me hyper-skeptical about any large, round number that's thrown my way," he bloggedin May shortly after the report was released.

Where do you see VERIS going in the future? Is this the kind of thing that could eventually become a requirement for regulated industry segments?

Marcus J. Ranum, chief security officer, Tenable Security Inc.

This month, Ranum digs into some of the industry issues surrounding the report with co-author Jay Jacobs, a senior data analyst on the Verizon RISK team. Exploring and visualizing data is also the topic of an upcoming book -- look for it in February -- that Jacobs is writing with Bob Rudis called Data Driven Security: Analysis, Visualization and Dashboards.

Marcus Ranum: Jay, at this point pretty much everyone in computer security has read the Verizon DBIR -- or at least looked at the pictures --  so they may not realize it but they're familiar with your work. I think it's safe to say that the computer security world needs to do a better job of keeping the information that would let us generate better security metrics. Maybe, you could briefly explain the VERIS framework and how that set of security metrics was developed?

Jay Jacobs: VERIS is an acronym for the Vocabulary of Event Recording and Information Sharing. It was developed to convert the narratives our forensic investigators would generate into data we could aggregate and analyze. We wanted to pull trends and answer simple questions like: Who was attacking who, why and how?

We had looked at the other taxonomies and struggled to find one that captured all the variety and complexity we saw in breaches. We tried to leverage the strengths of other taxonomies and borrow where we could. We have completely opened up the VERIS framework, so that others can adopt and implement it.

Ranum: Where do you see VERIS going in the future? Is this the kind of thing that could eventually become a requirement for regulated industry segments?

Jacobs: There are two major challenges with collecting and sharing information like this. First, collecting information for every event takes time and resources, so we've got to limit the amount [of data] and complexity of the questions we ask. Second, we always have more questions than the data can answer, and we want more detail and more data. However, we have to strike a balance between collecting just enough data to [cover] our questions, and VERIS does fairly well in that regard.

VERIS could absolutely be leveraged as a standard for breach disclosure, but in practice it would depend on the purpose for the regulation. If the intention is to just "shame and blame" the victim, as many current breach disclosures seem to, then VERIS isn't needed -- we're doing that. But if we really want to learn from these breaches and be able to aggregate within and across industries, then we'd need to move towards a common language and a framework.

Ranum: I was talking to some security people at a large corporation the other day, and they said they were using the VERIS framework. Personally, I really hope that kind of thing takes off. I guess my daydream scenario would be to see reporting frameworks like VERIS migrate down, somehow, to home users. All that we can go by is data from the antivirus companies and an occasional poll, right? We mustn't forget that they're where the botnets come from…

Jacobs: VERIS is already capable of recording home user events, and we've looked at a few incidents that affect organizations -- the effect of Zeus on financial services, for example. Even with limited data around botnets, we know that we as an industry will need to tackle home user security. However, I think it is a different challenge than improving on data collection.

The capability exists with fairly good accuracy -- especially with newer machine-learning algorithms -- to detect at the network layer when a computer joins a botnet, but then what? Sandbox them? Patch the system automatically? I think the valuable research around the home user is looking into the efficacy of the various treatment options. But any solution for the home user will have to be done without relying on their cooperation. They are notoriously, oh what's the word… inconsistent?

Ranum: I know that as a statistician, your primary interest is analysis and comprehension, and you want to avoid prescriptive results. But surely you know that people read a report such as Verizon DBIR and want to know what they should do based on it. How do you straddle that line and avoid dipping into advocacy?

Jacobs: First and foremost, we should do no harm. I don't think we avoid prescriptive results, because I'd love to be able to say, "Everyone do these five things, and we'll be OK." The data, however, just doesn't cooperate with that desire. The data prevents us from being able to tell people how to prevent all SQL injection, but it does tell us that SQL injection occurs more in some industries than others. We can only describe the trends within the actors, actions and assets with the hope that people understand their own environment enough to define priorities based on the trends we are seeing.

The data does allow us to be an advocate for some things. For example, most system attacks target valid credentials at some point in the attack chain. I feel confident in recommending two-factor authentication to almost all mature organizations -- I'd recommend it to all if it were easier and cheaper.

And, the data also says that any list of top "N" controls for everyone is probably counter-productive, so I'd recommend some healthy skepticism of "top-whatever" lists. A large bank needs a different set of priorities than a large media company. But even though there is diversity across industries, the patterns within industries are where the really interesting stuff is and we hope to explore that more in the upcoming 2014 DBIR.

While we face adaptive and intelligent adversaries, they are not unpredictable -- nor are they completely predictable. They appear to get into routines and habits. They find a handful of techniques and stick to those patterns. These patterns are so prevalent in our data; it gets hard to see some of the smaller patterns or the unique incidents that are occurring.

Ranum: Well, that's fantastic. Once you get to the point where you can say with confidence that this practice correlates with this outcome, I think we're able to start getting some traction. That raises the whole question of whether people and outcomes really are predictable. If we switch over to two-factor authentication will we really see a benefit? That seems to be the kind of self-defeating reasoning people engage in when they choose to leave their seatbelt off, doesn't it?

Jacobs: Attackers are people, too. They form habits, get comfortable with certain tactics and avoid others and, perhaps more importantly, they generally won't fix their attack patterns until they break. This means that we should be able to detect and address these patterns on a large scale, which requires a shift in thinking.

If you asked me to describe the next attack, I'll probably get something about it wrong, maybe even all of it. But if you ask me to describe the next thousand attacks, I can do that better and with more accuracy.

No organization can afford to stop every attack, so we have to think about what would stop most attacks, or even what changes will stop more attacks than other changes. The question then isn't if we'll see a benefit from two-factor authentication; of course, we will.

If you asked me to describe the next attack, I'll probably get something about it wrong, maybe even all of it. But if you ask me to describe the next thousand attacks, I can do that better and with more accuracy.

Jay Jacobs, senior data analyst, Verizon RISK team and co-author, 2013 DBIR

The real question is if we'll see more bang-for-the-buck from implementing two-factor authentication over the few dozen other remediation projects waiting for funding. The data says attacks against password-based authentication are very common in system compromises. Adding a second-factor may force attackers to change tactics -- or stop the attack -- more so, than other controls. It doesn't say it would stop the attacker or that this is the most cost-effective solution for all companies; and the difference there is subtle but I think helpful.

Ranum: Let's talk about sampling bias. First off, if you can succinctly explain the problem, that'd be great. It seems that a lot of people don't understand why it's an issue. And, of course, how do you address it?

Jacobs: How much space do we have here? Put simply, sample bias exists when the sample is not representative of the population we are measuring. Sample bias generally exists whether or not we're doing data analysis. Think of the last big breach in the news, how many security practitioners attempted to extract meaning from that one event? That's sample bias. Within data analysis, we introduce sample bias during data collection -- and one story from the media is very poor data collection.

One way to reduce sample bias -- we can never eliminate it -- is to create a random sample, but in order to draw a random sample from a population, we need access to the whole population of breaches, which is infeasible. So we are limited to the data we can collect, which is called a convenience sample. This is common in multiple disciplines; for example, hospitals can only collect data on the illnesses in their hospital.

One solution is to redefine the population we're sampling from. In our case, we are looking at breaches (within organizations) that were large enough to bring in law enforcement or forensics. The end result is we cannot -- if we're honest -- apply inferential statistics to draw conclusions about the population. But this doesn't mean we should toss it out, this just lowers our confidence in the precision of our findings.

Once we identify the sample bias, we address it through our choice of words and presentation of numbers. As an example, we saw 40 out of 46 cases in manufacturing attributable to state-affiliated espionage. We could calculate the proportion of espionage in manufacturing to be 86.95652%. But that implies precision and confidence that [we] just don't have. Even without sample bias, because this is a sample, the best we could say is the true proportion of espionage attacks is somewhere between 74% and 94% (with 95% confidence). If you add in the possible sample bias, we could have confidence in saying that more than half of the large incidents in manufacturing -- those requiring external assistance -- were espionage-related. But contrast that to financial services where about one in 30 (hacking events) were espionage-related. We could have a lot of confidence in saying that manufacturing proportionally sees more espionage than financial services, but very little confidence in quantifying exactly how much.

By collecting this data, even with the sample bias, we are able to reduce our uncertainty -- which is quite high in information security -- around a great number of questions. That's really the goal here. So we gather, analyze and learn what we can, adjust and try again. Hopefully, we're improving on our methods as we go.

Ranum: That's a really good explanation of sampling bias. I used to lose my mind when I'd see studies in computer security based on obviously uncontrolled surveys like the old CSI/FBI studies. Really, the computer security community -- and perhaps the public in general -- need a better understanding of the benefits, limits and capabilities of the social sciences. Do you have any suggestions for what we can do to improve the security community's understanding of statistics? I remember one time when you schooledme pretty hard...

Jacobs: Honestly, I think you helped school me too when you challenged me to predict how you'd do over a night of gambling. And, of course, the answer is an honest: "I don't know exactly." But I try to show that we can reduce our uncertainty and answer it with a distribution of possible outcomes and influential decisions; bets offer varying risk and reward.

I think the first step towards improving our understanding of statistics is just raising awareness, not just for others, but for ourselves. Generations of people have developed methods to learn from data and once we tap into that, we'll find the world of data analysis opens up avenues for learning we couldn't see before.

About the author:
Marcus J. Ranum, chief security officer of Tenable Security Inc., is a world-renowned expert on security system design and implementation. He is the inventor of the first commercial bastion host firewall.

Article 6 of 7
This was last published in November 2013

Dig Deeper on Data security technology and strategy

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.

Get More Information Security

Access to all of our back issues View All