alphaspirit - Fotolia
How do you measure the effectiveness of your security program? Marcus Ranum uncovers how one Ivy League school uses information security metrics to improve and automate its processes in this Q&A with Joel Rosenblatt, the director of computer and network security at Columbia University's Information Security Office.
A Columbia alumnus, Rosenblatt has tapped the keyboards at the New York institution in the Upper West Side of Manhattan since 1973, first as an engineering student, then as a mainframe systems programmer and manager. Rosenblatt got hooked on security metrics (and nailing the "bad guys") when he was asked to build the university's security program in 2000. He used security measurements to enhance a range of projects from identity management to asset protection.
All that data has paid off. Information security metrics have provided concrete measurements to justify automated processes that monitor networks and systems, and even take care of compliance issues related to online copyright infringement.
I've known you for, what, nearly a decade now? And, I've always been impressed by the kind of metrics you keep and how you use them. How did you get started with your metrics program?
Joel Rosenblatt: I have a background in engineering and have always believed that if you can't measure it, it didn't happen. To me, metrics should be an intrinsic component of any program, because without some kind of measurement; you will never be able to improve on the process.
When I talk to some of my peers, often, their vision of metrics seems to be "stuff we present to the big shots." But in my experience, that's not how it actually happens in real life. The metrics inform the stuff we present to the big shots. Is that your experience? Are there specific metrics you share up the hierarchy? Or do you share metrics mostly with your peers?
Joel Rosenblattdirector, computer and network security, Columbia University
Rosenblatt: I have always found that the best metrics to share with the big shots are the ones that they are asking for. These measurements are not always the most meaningful, nor are they the ones that I care about, but they are the ones that get the job done. When I started doing security, they wanted policies so ‘how many [security] policies we have' was the metric we provided; not really useful or interesting, but it was exactly what they wanted to know.
My view on metrics is that they provide a feedback loop into the security system: Your metrics should allow you to tune your systems to make them better. I also believe that metrics should be able to be used to compare solutions: Is my Bayesian spam filter as good as your Barracuda firewall? This means that metrics should be normalized so you are comparing apples to apples. I have been able to share some metric data with peers, which is useful when there is a pervasive problem, such as DMCA [Digital Millennium Copyright Act] complaints.
Normalizing data for sharing is a really hard problem. How have you done that?
Rosenblatt: Normalization for this [comparison] means that your numbers are either unit-less or that the units match. This was one of the things I learned early in engineering school for exams: Figure out what the units are -- meters per second -- and then make sure that the answer you get is really in meters per second. You are guaranteed to have the wrong answer if it's in feet per second.
In security metrics, if you are looking at the number of compromised machines and you need to compare it, calculate the number in infections per thousand desktops -- this way you are comparing like numbers. If one shop has 100,000 machines and got a bug that affected 200 machines, and the other has 25,000 machines and got the same 200 infections, you know right away that the second shop is doing a poorer job of defending the desktops. Without knowing the total number of machines, 200 is a meaningless number. In metrics, context is king.
You said something the other day -- about how you collect metrics on vulnerability rates in systems based on how they are administered, and by whom -- that really floored me. Can you tell us a bit about that? I thought your results seemed counterintuitive.
Rosenblatt: Sure, our security setup -- called Point of contact and Incident Response System or PaIRS -- is designed to find compromised systems on our network based on the behavior of the system. We have two kinds of systems, managed and ‘"free love.'" The managed systems have an IT group that is responsible for their care and feeding; the free love systems plug into our network and get an IP address, no questions asked. The PaIRS system is fully automated: When a managed system gets compromised, the IT support unit gets notified; when a free love system gets compromised, it gets captured. If you look at the numbers, the percentage of managed systems that get compromised is much higher than the free love group. I attribute this to the fact that the free love people use automatic updates; the managed people have to wait until their IT group updates their systems.
In my experience, a good way to establish metrics is to analyze a business process, then instrument it thoroughly to understand better how you do it. You mentioned DMCA complaints; can you tell us a bit about how you apply metrics and automation to that process?
Rosenblatt: Going back about 11 years or so, when we first started getting DMCA complaints, the process was done by hand. A complaint came in; a security tech took the information down and then tried to track down the user -- using timestamp, IP address, DHCP logs and network jack tables. The jack was turned off, and we waited to see who complained. We often had inventive students who just found another jack and were back in business. We even had one who went out and bought a 100-ft cable, ran it out the window of the dorm room down to the next floor and plugged in there. Back in 2007, we started seeing more than 400 complaints in one month. Processing these complaints required a fulltime engineer, whose time could be better spent. At that point, I decided to fully automate the process.
Today, we have a fully automated system: The email comes in to a service account. It is parsed for relevant content. The complaint is verified -- IP address, port, traffic -- the MAC address of the machine is determined and ‘captured.' When the user brings up a Web browser, they are taken to a webpage, which explains that they were caught for a copyright violation. They are taken to an online class on copyright, then to a quiz on the class, and if they pass, they get a valid IP address back. A letter is generated to the dean of their school -- or human resources -- and they get to sit down and have a chat. The details of the case are stored in our Service-Now ticketing system, and I can pull up metrics on the current and past states of the tickets being processed.
The interesting metrics are that by automating, we can now process 100% of the incoming complaints. We find that we get less than 1% of repeat offenders. Without the ability to capture the data and process it, we would never be able to tell if our process was effective.
What metrics did you keep and produce to justify doing that? Presumably, since you were tracking DMCAs at 400 per month in 2007, you retain that kind of data. What other metrics did you keep?
Rosenblatt: I keep raw data in our ticketing system -- incident date, user ID, first or second incident, validation of the incident and so on. Once the process was automated, it was obviously much easier to keep the data. The presentation of the data was a simple matrix -- ‘month by year' with ‘number of incidents' in the box and a line graph color coded by year, with ‘number of incidents per month' driving the curve. We also have a matrix with ‘how many total incidents by count,' including ‘first,' ‘second' and ‘the percent of total.' This metric was the key one used to show the effectiveness of our program.
Back in 2008, I wrote an article called ‘"Security Metrics: A Solution in Search of a Problem,'" which was published in the Educause Quarterly. It sums up how I feel about metrics: They can be used to shine a light on a lot of security problems, especially to show how valuable the security process is, but only if you keep them.
Columbia University and one of its affiliated hospitals, New York-Presbyterian, reached a HIPAA settlement in May after reporting the exposure of electronic-protected health information in September 2010. What happened, and are you able to comment on any changes in organizational policies and procedures?
Rosenblatt: The hospital is responsible for the security of the systems there. It is my understanding that there were many changes in policies and procedures after this exposure in 2010. It is their belief -- the hospital security team's -- that with the current systems that they have in place, an exposure such as this would be caught before any data was released.
Joel, thanks for your time. It's always good to catch up with you!
About the author
Marcus J. Ranum, chief security officer of Tenable Security Inc., is a world-renowned expert on security system design and implementation. He is the inventor of the first commercial bastion host firewall.