Virus definition updates, or antivirus DAT files, from antivirus vendors usually present no more than a blip on an enterprise network's screen. However, making sure your enterprise is prepared for such an incident is critical because buggy DAT files can wreak havoc on computers and system operations.
In fact, that very scenario played out in many IT departments just recently. During the week of April 19th, a bad DAT file update from McAfee Inc. caused Windows XP Service Pack 3 (SP3) systems to crash and reboot repeatedly. Apparently McAfee blamed the update problem on a bad DAT file that inadvertently quarantined a critical Windows process called svchost.exe.
When I heard this news, it reminded me of a bit of a horror story in one of my former CISO positions where our antivirus vendor also sent out a buggy DAT file that wrought havoc on our computers and system operations. The story goes like this:
AV DAT file freezes many computers
Because of concerns with the rapid distribution of worms, malware, botnets, etc., our organization had decided to allow automated antivirus updates of all enterprise systems whenever a DAT file was uploaded to our antivirus system server. Hence, we never pre-tested the DAT files; frankly, we never had a reason to suspect our vendor's quality assurance process would be lacking, as we had never had a problem in our multi-year history with this vendor.
It was late on a Friday afternoon, however, when I got a frantic call from our service delivery manager saying that the company "was under attack" and that I needed to be at the incident response area ASAP. The symptoms were that many of our workstations and some servers had gone to 100% CPU usage and did not respond to any commands, troubleshooting measures, etc. Essentially, most of the company's computers were frozen and computer work -- such as email and word processing -- could not be done.
As I was driving to the incident response location, I thought through what might be happening: Perhaps a logic bomb had been implemented or a botnet distributed denial-of-service (DDoS) attack was in progress. So, with these possibilities in mind, I began to call members of my strong network of fellow security professionals to ask for their help and guidance. Before long, the incident response team at my company opened up a telephone bridge so I, my experts and my experts' experts -- including security pros from Microsoft -- could all discuss what was happening and brainstorm about the potential causes.
The incident response team also began troubleshooting and realized the only affected computers were Windows XP computers with Service Pack 2 installed, and that we had just received a DAT file from our antivirus vendor shortly before the computers froze. So, with one of the unaffected computers, contact was made with our antivirus vendor where we were able to download a new/corrected DAT file. From there, we were able to copy the working DAT file to our affected machines and return them back to normal, stable operation.
Network incident response lessons learned
Some key lessons from that incident I'd like to pass along include the following:
- Have a strong security network: Build and sustain a strong network of fellow security professionals that you can rely upon in case of serious security events. For instance, consider who you would call if you discovered an active botnet in your company. Who within the local FBI or Secret Service offices would you call if you found child pornography, counterfeit intelligence or national security concerns? Have a list of security peers and experts you can contact in a hurry that includes their cell phone numbers, email addresses, personal phone numbers, etc., and keep it handy in your smartphone or as a simple paper list in your wallet.
- Know how to patch and repair the network if access to the Internet is down: This may sound a bit strange, but in the case of my company's DAT file problem, we could have been in a situation where we would have had no access to the Internet to download patches, research corrective actions, etc. As such, consider having a laptop (with an earlier version of antivirus) equipped with an EVDO or GSM card installed so you can do some portable troubleshooting and downloading of available patches. Then you can move the patches via a USB stick or a CD burned from the laptop to the internal systems to help restore them to normal operation. (Don't forget to check for worms and viruses on the USB and CD before moving the file around the enterprise.)
- Have an organized, practiced incident response team in place: In the cyber world you must always be ready for a serious security incident, and having an incident response team in place that knows how to work and react as a team is really important. Be sure the team follows the "PICERF" approach: Preparation, Identification, Containment, Eradication, Recovery and Follow-up. A really good -- and free -- resource for cyber incident response is NIST Special Publication 800-61 v1 (.pdf), "Computer Security Incident Handling Guide." Be sure to review the appendices and checklists as well.
- Have a phone bridge ready to use: The incident response team should have a phone bridge ready to activate for major and minor events. This not only allows for more effective cross-discipline communications, but also makes it easy to bring outside experts into the conversation if necessary. You can arrange for these phone bridges with your phone service vendor or check out www.freeconferencecall.com or www.freeconfernce.com.
- Inform the vendor and your network: An event like the one I described at my employer needs to be communicated to the vendor that caused (or potentially caused) the incident. Also, consider telling the Computer Emergency Response Team (CERT), US-CERT and fellow security professionals in your region or industry. The point here is not necessarily to put your company on report, but instead to help your peers and others in the industry be better prepared should they experience the same problem or incident.
Admittedly, a bad DAT file from an antivirus vendor is generally the exception rather than the rule. However, learning from such events can help you be better prepared should any sort of major network outage occur.
- Learn more about defining an incident response process when short-staffed.
- Also, read more about disaster recovery and business continuity tabletop exercises.
About the author: Ernest N. Hayden (Ernie), CISSP, CEH, is the founder and owner of 443 Consulting, LLC, an enterprise focused on providing quality thought leadership in the areas of information security, cybercrime/cyberwarfare, business continuity/disaster recovery planning, and research. Most recently, Ernie was Information Security Strategic Advisor in the Compliance Office at Seattle City Light.