Cybersecurity students and practitioners learn early in their careers that incident management is a foundational...
aspect of their organization's success. At a minimum, a cybersecurity practitioner needs to understand the details included in the classic incident response lifecycle of preparation; detection and analysis; containment, eradication and recovery; and post-incident activity.
A useful resource on the security incident response process is the "Computer Security Incident Handling Guide" from the National Institute of Standards and Technology (NIST). As noted in the NIST guide, post-incident activity includes using the collected incident data and retained evidence following an incident to create a list of lessons learned.
Other cybersecurity standards used in the industry include requirements for effective security incident response, such as:
- The North American Electric Reliability Corporation Critical Infrastructure Protection (CIP) standard, CIP-008-5, "Cyber Security -- Incident Reporting and Response Planning" lists expectations for cybersecurity incident response planning, implementation, testing and documentation of lessons learned.
- NIST Special Publication 800-53 Revision 4, "Security and Privacy Controls for Federal Information Systems and Organizations," includes discussions and controls to address incident response policy and procedures, incident response training, testing, incident handling, monitoring, reporting, and how to update the incident response plan to address the problems encountered during plan implementation, execution or testing.
- ISO/IEC 27002:2013, "Information technology -- Security techniques -- Code of practice for information security controls," also addresses security incident response and learning from security incidents, as well as the collection of evidence.
- ISA/IEC 62443-2-2, "Implementation Guidance for IACS Security Management System," covers industrial controls system security management and includes a security incident response process that's similar to ISO/IEC 27002: 2013, in that it requires the mechanisms in place to learn from security incidents.
With all of these standards, there is a consistent theme of needing a cybersecurity manager to capture the lessons learned following a security incident. However, effectively capturing lessons learned following security incidents -- and tracking and implementing recommendations for improvement -- is usually either not done or not done well.
A key reason for this is that the standards cited above do not include detailed or readily implementable guidance, such as checklists, to conduct a robust post-mortem and to capture lessons learned.
The after-action report
The military and law enforcement tend to deal with exercises and major events, and then close out the event with a disciplined and focused after-action meeting and report -- at a minimum, a hot wash. Similarly, the business continuity planning/disaster recovery domain must complete after-action reports, especially after major exercises and natural disasters.
The Federal Emergency Management Agency (FEMA) documented the Homeland Security Exercise and Evaluation Program (HSEEP), which provides guiding principles for security exercise program management, design and development, conduct, evaluation and improvement planning.
HSEEP was developed to address and capture good practices for large-scale homeland security exercises -- not cybersecurity incidents. However, its detailed guidance on after-action reporting is what the cybersecurity community needs.
Capturing lessons learned after a security incident
To reiterate, the general guidance provided by the standards authorities to capture lessons learned following a security incident are rather shallow and not very robust. Therefore, the following should help cybersecurity managers build a structured approach to conducting and documenting a lessons learned meeting with all the involved parties after a major security incident. Additionally, a focus on listing, implementing and tracking recommendations for improvement is included below.
The plan discussed here can also be used for after-action review and to capture lessons learned.
Setting up the after-action/lessons learned meeting
As a good practice -- and to be consistent with the appropriate standards, if necessary -- a post-mortem review of a security event should be performed to identify and capture the lessons learned.
The meeting should be held in a timely manner to ensure that participants can properly reflect on the events. Also, you don't want to wait too long, or the participants could forget the details of the incident or get caught up in other duties and not be able to take the time for a lessons learned meeting.
Give participants the option to respond anonymously, but try to track responses at least by department so that it's clear where response expertise resides.
As far as who to include in the lessons learned meeting, be sure to think beyond the norm. Include people who were present on the day of the incident; the people who brought the systems back online; and other supporting players, such as human resources, legal, public relations, logistics, purchasing, etc., as their reports about the incident may provide a better and more holistic big picture.
A key consideration for the post-mortem is to keep the emphasis on identifying the facts and not placing blame on any individual actions or inactions.
Lessons learned meeting agenda and questions
When you've brought the attendees together, you will want to understand the who, what, where, when, why, how and if aspects of the incident. This is not a precise process, but more of a contributory discussion where participants are expected to offer comments regarding the events and possible causes.
In preparation for the meeting, have a timeline of the event documented and available for a presentation during which attendees can add addition events.
Some questions to consider for the meeting include the following:
- When was the problem first detected, how, and by whom?
- What was the scope of the incident? Systems affected? Symptoms? Alarms? Alerts?
- What procedures were followed? What procedural steps were missed or omitted, and why?
- Who was in charge of the incident? What was the incident response leadership and management structure? Was this effective, or are changes suggested, and why?
- Were the procedures and directives adequate? If not, what needs to be modified?
- What information was missing during the incident? What are the I wish I had ... aspects of the event to be considered for future incident response and supporting procedures?
- Did you take any steps or actions that might have inhibited recovery, and why?
- Did you take any steps or actions that worked well and reduced the impact of the recovery, and why? Were these steps mentioned in any procedures or incident response guides?
- How could sharing incident information with other personnel and organizations be improved? What worked particularly well?
- What corrective actions can prevent similar incidents in the future?
- What precursors or indicators should be monitored in the future to detect and prevent similar incidents?
- What additional tools, resources or training are needed to detect, analyze and mitigate future events?
- What could have been done better? How? Why?
- What is the wish list of improvements in the cyber incident response process, procedures, forms, equipment, etc.?
It is important to remember that the purpose of after-action reporting is to capture the team's strengths, weaknesses, lessons learned and corrective actions for the security incident response process.
An example of a detailed after-action reporting template was included in FEMA's HSEEP.
Improvement plan tracking
A key deliverable from the after-action reporting and lessons learned discussions is a list of the action items that need to be implemented in order to improve the security incident response process. These recommendations need to be tracked to completion and recorded as such. Unfortunately, many organizations don't track their incident response lessons learned and recommendations to completion.
The security incident response process goes beyond simply preparing for an event, detecting a cyberattack, analyzing a situation, and then containing and eradicating the threat. An effective organization recognizes that the security incident response lifecycle requires serious attention to post-incident reviews and analyses, such as lessons learned meetings and after-action reporting, with a primary emphasis on fixing the issues discovered during the event so they won't negatively impact the response to the next cyberattack.