isaacnewton - Fotolia
Metrics that get results have proved challenging for many security teams. In addition to figuring out what to measure, security professionals have to know which measurements and analytics their organization's executives and board actually care about, and many CISOs falter at this step.
Diana Kelley knows how to build a successful metrics program. As the executive security advisor to IBM Security Systems, she leverages 25 years of IT security experience to provide advice and guidance to CISOs and security professionals. Kelley has contributed to the IBM X-Force report and frequently publishes on the company's Security Intelligence and Smarter Planet blogs. She is also a faculty member with IANS Research and serves on the advisory board for InfoSec World 2015 and the IBM Network Science Research Center Smart Grid Advisory Group. Marcus Ranum talked with Kelley -- a former SearchSecurity.com contributor -- about security metrics, risk appetite, and short and long term ways that security professionals can improve their metrics efforts.
Marcus Ranum: It seems that metrics are one of the areas where computer security practitioners have trouble getting traction. Why do you think that is? And what's your first suggestion for how an organization can get started with a metrics program?
I suppose I loaded that question when I said 'metrics program.' Do we even need a specific metrics effort or is there some other approach to getting a clue of what's going on?
Diana Kelley: My first suggestion is to figure out what the organization wants to achieve with the metrics, because that will inform what needs to be measured as part of the metrics and analytics program, or even if the company wants or needs a metrics program. Measurements are fairly simple: How many high sev vulns [severe vulnerabilities] are there in our public-facing websites? Or, even, how many public-facing websites do we have?
Metrics compare or analyze the measurements to get to some answer. For example: We had seven high sev vulns last week, five high sev vulns yesterday and 10 today, so what does that tell us about the state of the public websites? Getting deeper into the analytics -- kind of public-facing app, language or stack, and development team can all be added to the mix -- to turn the measurements into metrics. But there's no point gathering all that data if it's answering a question no one at the company cares about. And sometimes companies want answers that they just can't get the measures for, such as 'Am I safer today?'
It's important to remember that different roles and groups will care about different metrics. An auditor will be focused on metrics related to compliance while a CIO may care more about performance and improvement.
Ranum: I've been pretty notoriously dismissive of risk management on the basis that it's 'garbage in, garbage out,' and I finally realized that the real problem is that we, as an industry, don't have a good way of generalizing the outputs from our metrics (or we don't keep them at all). How can, say, a bank produce useful security metrics that a hospital might be able to consume? Are metrics always an inward-facing process, or is there a useful way of sharing our experiences?
Kelley: Risk management in the classic -- determine the risk LxI (likelihood x impact)and then decide the organization's risk appetite for each risk, and then figure out what controls and processes need to be in place to contain the risk at the approved risk level? Do you see 'garbage in, garbage out' because LxI are so hard to quantify and calculate? Or is it that there are just too many risks to consider to do this well? Because those are the big hurdles I see companies dealing with in risk management.
The other big one is asking the IT team to set risk appetite. IT's role is to present risk calculations to the business; then the board needs to figure out the risk appetite. But very often the business wants IT to set appetite. Another messy area is translating IT risk into business risk.
I believe there are many metrics that can be useful across verticals, but they'd be part of the overall metrics and risk program. Each individual organization then needs to understand what it wants to do with those metrics. To make that less abstract, consider something like a series of DDoS attacks coming from specific IP addresses that are targeting the bank. These IP addresses are bots or shadow sites. Although the health system isn't being 'DDoSed' by those IP addresses yet, it may want to be proactive and block those IP addresses or work with a DDoS prevention company that gathers IP reputation information from across verticals to do the blocking for them.
On the other hand, if the bank is looking at risks related to transactional latency in the milliseconds, those measures may have less importance to a health system that can tolerate a second of transactional latency.
Ranum: Alex Hutton likes to say his favorite metric to ask for is 'What are my riskiest business units?' What do you think of that? I'm sort of leery of the idea that one organization's metrics might work for others or for everyone. Do you favor organization-specific metrics or more 'meta' ones?
Kelley: Well, first of all, I think Alex Hutton [director of operations, risk and governance at a financial institution] is wonderful, and anyone that hasn't heard him talk about risk and metrics needs to get to one of his talks as soon as possible. The question, 'What are my riskiest business units?' is fantastic, but incomplete. Because, again, we need to get back to whether or not LxI have been calculated properly and for the right risks. And then the organization needs to ascertain the [risk] appetite for the business units. I've spoken to an organization that did a cost-benefit analysis of PCI compliance or taking the hit of fines and a failed return on compliance. They went with taking the hit. That's a very 'personal' decision that the company made for risk acceptance. Most organizations would say being non-compliant to PCI is very risky -- but that company didn't.
That's why I agree with you about being leery of one-size-fits-all metrics. Many standard measurements have value across verticals (days to patch), but what that means for risk at each company will vary by type of device -- is the unpatched system a medical device that needs to be re-certified if it's patched, or a website? -- and risk requirements of the organization.
Where I think it all comes down is that simple measures can be meta but turning those into risk metrics (LxI) and appetite need to be unique to the vertical and specific organization. One of the big problems is organizations want an easy answer, but risk metrics and analytics aren't easy.
Another issue we haven't touched on yet is misinterpreting the numbers. I did a piece on this for IBM's Security Intelligence blog a while back. Sometimes we look at stats that seem to indicate 'people are living longer' and want that to mean we can all be the non-ape version of the fifth Earl of Gonister in Aldous Huxley's After Many a Summer. But dig a little on that data and it turns out we're not living much longer, we're just better at not dying younger.
Is your company truly at a lower risk level because patches are applied more quickly? Or are you just unpatched for a smaller amount of time, but now suffering more outages due to buggy patches? It's tricky.
Ranum: Like you say, it's amazingly tricky. One of the things that drives me nuts about all this is that you can have an axis to the problem, which you haven't considered, and when you slice the data across that axis, a whole new reality falls out. For example, we might notice that longevity in the U.S. has gone up slightly, but if we slice the longevity across wealth, we discover that the wealthy are doing much, much better. They are pulling the average for everyone up, while some sectors are actually doing worse. This makes me extremely leery when I see a large complex metric that amounts to a roll-up of accumulated information: Sometimes, it reveals; sometimes, it obscures.
Kelley: Exactly. Dependencies that weren't accounted for, like wealth in longevity, can change the analysis and 'answers' significantly. Also, I do think sometimes people influence what they are looking for and the results they want to see. Risk calculations should be pretty cold and objective; risk appetite decisions can be much more subjective. But cold and logical doesn't always work for people. I'm thinking of Fight Club and how the company Ed Norton's character works for only cares about cost to pay off families of the dead and injured people and to deal with the PR fall-out versus cost to recall all the cars. That's horrifying but if cash outlay is the biggest risk, it makes sense. In IT risk, many companies make the personal interpretation that employees are more trustworthy and it's the outsiders from a foreign country that are the bigger risk. But data doesn't prove that out.
Ranum: I love reading Paul Krugman's Economics and Politics blog on NYTimes.com because he's really delightful at explanatory metrics; whether you agree with him or not about meta-economic theories, he sure explains his position well.
Kelley: That's a great point -- Paul Krugman's blog could help IT risk [professionals] a lot because he shows how to explain complex models and analysis in a clear way.
Ranum: I worry that people's understanding of big data is that it is some kind of magical thing that's going to figure out their data for them, whereas I see it mostly as an exploratory tool. It's like a friend of mine once said about his Hong Kong tailor: He can make you the best suit that you know how to ask for.
What's been the most successful use of metrics you've seen in a business context?
Kelley: Ha! That's a perfect analogy for big data. If you don't know what you're looking for, how do you know you're gathering the right data? In the measure-everything approach (which is a good one) the problem shifts to how do you know you're writing the right analysis rules to find risk-related information? One kind of cool thing about big data is that we can look for patterns and see if those relate to causality and get better with rules over time. At least big data is an attempt to gather and measure everything that we haven't seen before.
Short-term activities can be as simple as looking for the patient-zero laptop or device where re-infection of malware originates -- maybe that device is syncing to an infected one when it's off the corporate network -- or looking for high levels of failed logins. such as brute force attacks or a restrictive password aging policy that's frustrating users.
Long-term advanced analytics could get us to a much better understanding of how risk evolves in an organization. Tracking employee reviews could reveal that employees who get bad reviews are more likely to go rogue and try to steal data or perform other malicious activities. Or that certain seemingly unrelated attack patterns -- like ping sweeps followed by a rash of negative comments on Twitter or Facebook and an increase in phishing emails -- mean the company's been targeted and needs to tune alert-response levels.
IT risk metrics and analytics are tough and imperfect right now. But if we don't get started looking for at least a few answers, we're not going to figure out what we're doing wrong or missing, and won't be able to get to a better place. Actuarial tables adjust as new dependencies and risk factors are uncovered. They're not perfect, but they're good enough to keep most big insurance companies in business. IT risk metrics may never be pinpoint perfect -- is anything? -- but we can definitely do better.
About the author:
Marcus J. Ranum, chief security officer of Tenable Security Inc., is a world-renowned expert on security system design and implementation. He is the inventor of the first commercial bastion host firewall.