Open source came under fire last month after the chain of trust was compromised for a popular module, event-stream,...
that may have affected thousands of other dependent packages and millions of users.
Adam Baldwin, director of security at NPM Inc., the Oakland, Calif., company that runs the open source NPM registry, talked about the incident and the NPM security initiatives he has been working on since NPM acquired ^Lift Security in February in order to establish NPM's internal security team.
NPM is the default package manager for Node.js, and a user discovered malicious code had been added to the event-stream package without anyone noticing for over one month. The poisoning of the package was traced back to a malicious user who had taken over ownership of the module.
How did you view the recent poisoned package in terms of NPM security?
Adam Baldwin: Open source is wonderful, right? We get the advantage of a large ecosystem of packages and a lot of different levels of quality of packages in there. There [are] some people just trying it out, and some that are really maintained by organizations and kept up to rigorous standards. They're all mixed together in this ecosystem.
So, we've got this web of dependencies, and companies consume those. And at the end of the day, you're responsible for what you require. Hopefully, NPM is going to be able to provide some better tooling to help with that. It is a supply chain security problem for consumers of the software.
What do you see as the best ways to scale NPM security processes?
Baldwin: That's a challenge. There's a maintainership side that falls on the ecosystem to keep things going. The problem is that it looks very cut-and-dry when you look at the popularity of a particular module, and you say, 'OK. Well, it's above a certain number of downloads.' Where do we cut that off? Do we take 10,000 downloads a week? A million downloads a week? And certainly, as that goes up, the importance of it is going up, because more and more people are relying on it. So, that's an honest indicator.
If we already have a lot of users using a piece of the module, they will eventually uncover a problem. Now, that's not perfect. In fact, nothing in security is perfect, right? We, as NPM, are working on tooling to help understand context when those sort of events of interest happen. A change of ownership is interesting. A new publication from an IP address that's over Tor, that's interesting. That gives you more context than just a giant dependency tree to go on.
We've already invested heavily in the security team. So, hopefully, we can continue to build tooling that will help consumers understand that context, because really we can't assert what's important to you. You might be using a module for a critical system that only gets a few hundred downloads a week, maintained by just a couple of people. So, instead, I think the solution is to provide tooling and context to consumers to better make decisions about what's happening to that dependency tree, that ecosystem that they rely on.
How do you view your job providing NPM security for the community?
Baldwin: Just like traditional malware, it's a cat-and-mouse game. It's going to go on forever. You come up with a new, fancy defense mechanism, and they're going to get around it. So, we're going to play that forever, but that doesn't mean that we shouldn't do anything. I think that a combination of automated internal tools -- because we're seeing the data first, we have that type of data that we can build to -- I think that we can do some really interesting things with that data to proactively block malware.
But, really, we need to enable our 11 million users to be able to, if they see something, say something and to be able to flag those for review. And that's where our team can come in, as well. We can provide the incident response team for the ecosystem to help respond and help understand if this is a threat. How deep does that threat go? Did the hacker that published that particular module, that malicious activity, are they doing other things elsewhere in the ecosystem?
It's a process involving community, involving our internal tooling and team that we're testing.
We've touched on it a little bit here and there, but can you go through what kind of processes and security features you already have in place and what you're working on next?
Baldwin: Right now, the most recent thing that we have from a security feature for a user is NPM audit. When the ^Lift Security platform was acquired, we brought [Node Security Platform], the command lines, into NPM. We took it from our small user base in the security platform to 11 million users, and we turned that on overnight. We made security top of mind. That's not as proactive; that covers known vulnerabilities and can identify malicious packages, but it does get good reach to our users.
The most important thing, as well, in the registry is to protect accounts. So, we've got two-factor auth. Publishers can enforce two-factor authentication for modules that they have. As an example, if you bring in another maintainer, you can require that they have two-factor auth enabled. But, again, that doesn't protect you from the actively malicious author and things like that.
So, where we want to get to next -- and I don't have concrete details for you -- but I talked about behavioral analysis of modules, anomaly detection for publications and just understanding what hackers are doing is kind of the next step for me. Getting that data set and getting those signals, getting those signals out to our end users consuming those modules so they can take action. We'll likely see those crop up initially in things like NPM enterprise, but I want to see features like that that are protecting the entire registry. It's difficult to do at scale.
We've got over 800,000 packages. That's like 7 million individual versions of modules and such. That's a lot of code to cover, but there are patterns that are emerging. There's interesting tooling that we can build. That's kind of what I'm seeing [as our] next steps.
Would that kind of anomaly detection be able to cover all the packages hosted?
Baldwin: That's what I want. I want to be able to tell, 'Did you intend to publish this?' Going along with that is sensitive data exposure, too -- digitally hidden data, sensitive keys, things like that. People do that on accident. So, helping users protect themselves and having two-factor available to help users protect their accounts.
How do these NPM security features function in the case of a fork where poisoning the initial code may affect many other modules in the registry?
Baldwin: You kind of covered an edge case, because forks are a challenge. I don't have a solution for that, but what I can say, in this particular instance, we saw people quote that event-stream was downloaded millions and millions of times because [it's an] interesting number to get people to be [aware of] the data incident. But because it was such a targeted attack, it didn't impact a whole lot of users, and I'm doubtful that will impact those forks because of the required environment it needed to run in.
Now, that's not to say that the next one won't be different. The [owners of the] forks take on responsibility depending on when it was forked. If it was forked before those changes were in or that dependency was in, then maybe you don't want to accept upstream. They're going to be fine. It's kind of one of those initial things. It might bring those changes in. It might not. It depends on forks.
If you think of a fork as like a company, they're really responsible for understanding what those upstream changes are before they bring them into their forks.
With new NPM security features like anomaly detection alerts, would you send those alerts to a fork in order to bring awareness to potentially malicious upstream changes that might have occurred?
Baldwin: We don't really know. Forks are interesting in NPM. We don't actually know which modules are the forks of which, because that sort of occurs in GitHub-land where those projects are being maintained actively. In some cases, we can tell there's forks, given that they configure to certain points of the GitHub repo. We can do that analysis. I think that that would be an initial challenge and one of those things that we could look at solving down the road. But, definitely, if you're using a fork on a module, you're potentially taking on additional risks if it's depending on if it's an actively maintained one or not.