Marcus Ranum: Let’s jump right into this, shall we? You know I’ve been a pretty vocal skeptic of the idea we can...
manage risk in computer security -- especially since we’d have to understand risk first and it doesn’t seem like we do a good job of that. It seems almost unfair to point to nuclear accidents and Wall Street crashes as examples of risk management failures, but, really -- is that unfair?
Bob Blakley: You’re joking, right? Of course it’s fair to point out that these things were failures of risk management. Take Chernobyl. The Chernobyl accident has certainly already cost more than the value created by the Chernobyl reactor over its lifetime, and the costs will keep adding up for the foreseeable future. Furthermore, the accident imposed costs on people who had nothing to do with the reactor’s construction, did not benefit from it, and were not given any say in any decision regarding its construction or operation. This is the textbook definition of a risk management failure – an incident that causes an entire initiative to have a net negative lifetime value, and whose costs cannot be contained within the organization that undertook the initiative. No sane person, having seen the consequences of the accident, would have built the Chernobyl plant in the form it was built AT ALL. It’s not clear (to me anyway) whether we could successfully have managed the risks of a nuclear reactor designed as Chernobyl was, though we could of course have gotten lucky. I believe there are reactor designs --certainly fusion reactors and possibly also some types of fission reactors -- for which the risks could be managed. But that’s not necessarily good news for security, because (except for terrorist threats) reactor risks aren’t like information security risks – the security risks are worse! Natural risks, like earthquakes and tsunamis, don’t analyze reactors and try to figure out what the worst possible sequence of events is. Only people do that. So natural risks aren’t ideal worst-case adversaries. In security, we always have to assume we may be dealing with a worst-case adversary.
Marcus: Your point about the accident imposing costs that were much larger than the lifetime value of the initiative is well-taken. I keep encountering that problem again and again when it comes to security: The security practitioner who is waving the caution flag is competing against the best-case estimates of how a project will work out. It’s a matter of duelling fictions, and if one fiction is, “It’ll save us a million bucks and increase our customer awareness” and the other is, “It’ll possibly expose our customer database,” we know which fiction is more attractive. I guess security practitioners can be a little more definite, can’t we? It’s not like we’re saying: “It’s possible that people will try to hack the system” as in “9.0 earthquakes are possible.” We’re saying, “People will try to hack the system.” Can we say that what’s possible is a virtual certainty?
Bob: The bad guys have to feed their kids too, so yeah, we will be attacked. And the attackers will be as smart as we are. And either they will have significant resources at their disposal, or we’re protecting the wrong things. Risk management tends to turn into a political discussion, in which executives use phrases like “possible” and “unlikely” to put off investments in protection in favor of investments in dangerous, but potentially profitable functionality. This is a good reason to get rid of risk management and replace it (at least in the computer security context where we have real adversaries) with game theory. Doing so shifts the conversation from, “What are my odds of being hit by a meteorite,” to “Do you think the bad guys have guns?”
Marcus: A couple of years ago I read Charles Perrow’s Normal Accidents, and was fascinated by his argument that as systems become more connected and interdependent, our ability to predict failure modes eventually goes out the window. I’m not sure the idea of “connectedness” isn’t just subjective hand-waving, though, but it does sound compelling. You can’t help but think of the Japanese nuclear reactor situation: Their fail-safes generally worked, and we can all agree it could have been a whole lot worse. But, it also could have been a lot better. How do you reason about a potential system failure in which two things that exceed your worst-case scenario happen at more or less the same time?
Bob: I’m not sure we can agree yet that the Fukushima Daiichi reactor accident could have been a whole lot worse. One of the things the Sendai earthquake and its aftermath have really brought home to me is that certain kinds of accidents (including nuclear reactor accidents) happen in slow motion. Solon’s advice to Croesus was right: Wait until the end to judge how bad things are. But to answer your question rather than dispute your premise, here’s how I think we ought to reason about potential system failures: At the zeroth order, we should imagine the worst possible sequence of events and estimate the consequences. Unless we’re sure in this instance we can either prevent a disaster, or recover from the disaster and pay for all the consequences, and not just direct consequences to ourselves, we shouldn’t proceed with building the system. After that analysis, we can move on to first-order considerations, like how much to spend on controls, what controls to implement, how to allocate liability for failures, how to assess penalties for malfeasance and incompetence, and so on.
Regarding Perrow’s notion of connectedness, I think what’s really going on is nonlinearity -- or extreme sensitivity to initial conditions. The problem in complex systems is that small changes can have large and unpredictable consequences, and the number of event sequences is so large, that we can’t enumerate the space of possibilities and figure out which sequences of events we need to avoid.
Marcus: If we bring this stuff down to system/network security, it seems we’ve got the same problem. If we think of a security breach as a form of failure, then it seems like we’re back to trying to predict the future based on our current expectations, even though we know the future is not guaranteed to look like the present. I guess I keep coming back to the idea that some systems just shouldn’t be connected to anything else, ever, period. But that’s just as ridiculous in practice as saying, “No nukes, ever, period.” The cat is out of the bag. In fact, the cat is out of the bag and is playing in the freeway.
Bob: Well, we shouldn’t try to predict the future, but we should know the worst case. If the worst case is too bad to tolerate, we should do something different. If the worst case, or lots of cases, is impossible to understand, then we should do something we can understand. Your point about extrapolation is very important too. Lots of risk management failures are caused by assumptions that the future will be like the past. The future is often like the past -- and sometimes for deep reasons -- but sometimes important things change and our assumptions don’t change along with them. When that happens, our models give nonsense answers, and we get nasty surprises. There’s pretty good evidence that climate change is currently changing severe flood frequencies in a way that makes insurance policy pricing very inaccurate, for example.
Marcus: I think that’s where I get really uncomfortable with risk management for computer security. In general, the worst case that’s sold to management is nothing like the actual worst case. So I start off by questioning if our initial assumptions are even in the ballpark, before we get into the messy reality of the future. In IT the future is not at all like the past, unless you’re looking at fairly narrow aspects of it, like hard drive capacity or processor complexity. None of us thought the Internet would be a big deal back in the 1980s (because, if we did, we’d own our own private countries, by now) and none of us could have predicted social networking in the 1990s. I’m still dumbfounded by the information about themselves that people rush to publish online -- and even more dumbfounded when they complain that marketers trade that information. Something like auto insurance can be tied to long-term observation that young males are worse drivers than females, but there’s no long-term anything about the Internet, yet.
Bob: We could have a long discussion about whether security professionals or social changes will solve the information sharing problem. But, your point is right on; the Internet is not just a new thing -- it’s a new kind of thing. It’s hugely nonlinear; our brains were evolved to deal with linear phenomena. Not only do we not have long-term information about the Internet, we probably wouldn’t even have the right tools to make sense of long-term information about the Internet if we had it.
Marcus: I’m also deeply concerned that the people who are advocating a potentially risky thing are often the people who are performing the cost/benefit projections for it. Whatever you think of cloud computing, it seems a meta-risky behavior to have the business units that are pushing for it also be the ones who are predicting how much it’ll save and negotiating the service-level agreements. That seems suspiciously like having the oil companies performing their own studies that determine deep-water drilling is peachy keen. It makes me wonder what “determine” means in a given context.
Bob: Externalities are a huge problem, as are conflicts of interest and plain technical ignorance on the part of management, the political class and the general population. These are hard problems to fix. Externalities can only be eliminated by good public policy. Hammurabi understood this better than the U.S. government. Hammurabi regulated his engineers thus: “If a builder builds a house for a man and does not make its construction firm and the house collapses and causes the death of the owner of the house, that builder shall be put to death. If it destroys property, he shall restore whatever it destroyed, and because he did not make the house firm, he shall rebuild the house that collapsed at his own expense. If a builder builds a house for a man and does not make its construction meet the requirement and a wall falls, that builder shall strengthen the wall at his own expense.”
Alan Greenspan let them regulate themselves, and was surprised when that failed. After the 2008 crash, he said: “Those of us who have looked to the self-interest of lending institutions to protect shareholders’ equity, myself included, are in a state of shocked disbelief.” That’s how far backward we’ve moved in 4,000 years. Greenspan was explicit about where the problem lay -- it was a risk management failure. Here’s what he said: “This modern risk-management paradigm held sway for decades. The whole intellectual edifice, however, collapsed in the summer of last year.” It was the only important thing he was right about.
Marcus: I’m surprised that nobody in IT has hit on the too-big-to-fail strategy yet. Or have they…?
Bob: You weren’t supposed to bring that up. We don’t want anyone to know that our continued failure to solve the security problem guarantees us lifetime security. Could we edit this part out?
Marcus: Bob, as always, it’s fascinating to get your take on this stuff. I really appreciate your taking the time to school me; as usual, when I think about risk management, I feel more uncertain leaving than I did going in.