Home / Documentation / Security & Compliance / Incident Response Plan

Incident Response Plan

12 min read
Updated Jun 19, 2025

Incident Response Plan

Comprehensive framework for detecting, responding to, and recovering from security incidents with detailed procedures and playbooks.

Incident Response Overview

An effective incident response plan minimizes damage, reduces recovery time and costs, and ensures lessons learned improve future security posture.

Incident Response Phases

  1. Preparation: Establish and train incident response team
  2. Detection & Analysis: Identify and validate incidents
  3. Containment: Limit damage and prevent spread
  4. Eradication: Remove threat from environment
  5. Recovery: Restore normal operations
  6. Post-Incident: Document lessons learned

Incident Response Team Structure

Core Team Roles

Role Responsibilities Contact
Incident Commander Overall incident coordination, external communication Primary: CTO
Backup: VP Engineering
Security Lead Technical investigation, forensics Primary: Security Manager
Backup: Sr. Security Engineer
Operations Lead System remediation, recovery Primary: DevOps Manager
Backup: Sr. SRE
Communications Lead Internal/external communications Primary: PR Manager
Backup: Marketing Director
Legal Advisor Legal guidance, compliance Primary: General Counsel
Backup: External Counsel

Escalation Matrix

  • Severity 1 (Critical): Immediate - All hands on deck
  • Severity 2 (High): Within 30 minutes - Core team
  • Severity 3 (Medium): Within 2 hours - Security team
  • Severity 4 (Low): Next business day - On-call engineer

Detection and Analysis

Incident Classification

Type Indicators Initial Response
Malware AV alerts, unusual processes, C2 traffic Isolate system, capture memory
Data Breach Large data transfers, DB queries, access anomalies Block egress, audit access
Account Compromise Failed logins, privilege escalation, unusual access Disable account, reset credentials
DDoS Attack Traffic spike, service degradation Enable DDoS mitigation, scale resources
Ransomware Encryption activity, ransom notes Disconnect systems, activate DR

Incident Severity Determination

# Severity calculation matrix
def calculate_severity(incident):
    severity_score = 0
    
    # Data classification (0-3 points)
    if incident.data_classification == 'restricted':
        severity_score += 3
    elif incident.data_classification == 'confidential':
        severity_score += 2
    elif incident.data_classification == 'internal':
        severity_score += 1
    
    # Scope (0-3 points)
    if incident.systems_affected > 100:
        severity_score += 3
    elif incident.systems_affected > 10:
        severity_score += 2
    elif incident.systems_affected > 1:
        severity_score += 1
    
    # Business impact (0-4 points)
    if incident.revenue_impact or incident.compliance_violation:
        severity_score += 4
    elif incident.productivity_impact > 50:
        severity_score += 3
    elif incident.productivity_impact > 10:
        severity_score += 2
    elif incident.productivity_impact > 0:
        severity_score += 1
    
    # Determine severity level
    if severity_score >= 8:
        return 'CRITICAL'
    elif severity_score >= 6:
        return 'HIGH'
    elif severity_score >= 3:
        return 'MEDIUM'
    else:
        return 'LOW'

Containment Strategies

Immediate Containment

#!/bin/bash
# Emergency containment script

# Function to isolate compromised system
isolate_system() {
    local host=$1
    echo "Isolating system: $host"
    
    # Network isolation
    ssh $host "sudo iptables -P INPUT DROP"
    ssh $host "sudo iptables -P OUTPUT DROP"
    ssh $host "sudo iptables -A INPUT -i lo -j ACCEPT"
    ssh $host "sudo iptables -A OUTPUT -o lo -j ACCEPT"
    
    # Preserve evidence
    ssh $host "sudo dd if=/dev/mem of=/tmp/memory.dump"
    ssh $host "sudo netstat -anp > /tmp/network_connections.txt"
    ssh $host "sudo ps aux > /tmp/processes.txt"
    
    # Disable remote access
    aws ec2 modify-instance-attribute --instance-id $(get_instance_id $host) \
        --no-source-dest-check
}

# Function to block malicious IPs
block_malicious_ip() {
    local ip=$1
    echo "Blocking IP: $ip"
    
    # Update WAF rules
    aws wafv2 update-ip-set --scope REGIONAL \
        --id $WAF_IPSET_ID \
        --addresses $ip/32
    
    # Update security groups
    aws ec2 revoke-security-group-ingress \
        --group-id $SG_ID \
        --ip-permissions "IpProtocol=-1,FromPort=-1,ToPort=-1,IpRanges=[{CidrIp=$ip/32}]"
}

Long-term Containment

  • Implement additional monitoring on affected systems
  • Deploy honeypots to detect lateral movement
  • Increase logging verbosity
  • Implement temporary access restrictions
  • Enable enhanced authentication requirements

Eradication Procedures

Malware Removal

  1. Identify all infected systems through IoC scanning
  2. Remove malware files and registry entries
  3. Patch vulnerabilities exploited by malware
  4. Reset all potentially compromised credentials
  5. Verify complete removal through multiple scans

Account Compromise Response

# Account compromise remediation
class AccountRemediation:
    def __init__(self):
        self.audit_log = AuditLogger()
        
    def remediate_compromised_account(self, username):
        """Complete remediation for compromised account"""
        # Disable account immediately
        self.disable_account(username)
        
        # Terminate all active sessions
        sessions = self.get_active_sessions(username)
        for session in sessions:
            self.terminate_session(session)
        
        # Reset credentials
        temp_password = self.generate_secure_password()
        self.reset_password(username, temp_password)
        
        # Revoke all tokens and keys
        self.revoke_api_keys(username)
        self.revoke_oauth_tokens(username)
        
        # Audit recent activity
        activities = self.audit_account_activity(username, days=30)
        
        # Check for persistence mechanisms
        self.check_scheduled_tasks(username)
        self.check_startup_items(username)
        self.check_ssh_keys(username)
        
        # Log remediation actions
        self.audit_log.log_remediation(username, activities)
        
        return {
            'status': 'remediated',
            'actions_taken': self.get_remediation_summary(username),
            'requires_review': activities
        }

Recovery Procedures

System Recovery Checklist

  • □ Verify threat completely removed
  • □ Restore from clean backups if needed
  • □ Apply all security patches
  • □ Harden system configuration
  • □ Test functionality thoroughly
  • □ Monitor closely for 48 hours
  • □ Gradually restore normal access

Service Restoration Priority

  1. Tier 1: Authentication services, core infrastructure
  2. Tier 2: Customer-facing applications
  3. Tier 3: Internal tools and services
  4. Tier 4: Development and test environments

Communication Templates

Internal Communication

Subject: [SEVERITY] Security Incident - [INCIDENT ID]

Team,

We have detected a security incident requiring immediate attention.

Incident Type: [TYPE]
Severity: [CRITICAL/HIGH/MEDIUM/LOW]
Affected Systems: [SYSTEMS]
Current Status: [INVESTIGATING/CONTAINED/RESOLVED]

Immediate Actions Required:
- [ACTION 1]
- [ACTION 2]

Do not discuss this incident outside of authorized channels.

Incident Commander: [NAME]
Bridge Line: [PHONE]
Slack Channel: #incident-[ID]

Customer Communication

Subject: Important Security Update

Dear Customer,

We recently detected [general description] affecting [scope].

What Happened:
[Brief, factual description without technical details]

What Information Was Involved:
[Specific data types if any]

What We Are Doing:
[List of actions taken]

What You Should Do:
[Specific customer actions if needed]

For More Information:
[Contact information]

We take security seriously and apologize for any inconvenience.

Sincerely,
[Company Leadership]

Evidence Collection

Forensic Data Collection

#!/bin/bash
# Forensic evidence collection script

CASE_ID=$1
EVIDENCE_DIR="/forensics/$CASE_ID"
mkdir -p $EVIDENCE_DIR

# Collect system information
echo "Collecting system information..."
date > $EVIDENCE_DIR/collection_time.txt
uname -a > $EVIDENCE_DIR/system_info.txt
uptime > $EVIDENCE_DIR/uptime.txt

# Collect memory dump
echo "Dumping memory..."
sudo dd if=/dev/mem of=$EVIDENCE_DIR/memory.dump bs=1M

# Collect network information
echo "Collecting network data..."
netstat -antp > $EVIDENCE_DIR/network_connections.txt
iptables -L -n -v > $EVIDENCE_DIR/firewall_rules.txt
ss -tulpn > $EVIDENCE_DIR/listening_ports.txt
arp -a > $EVIDENCE_DIR/arp_cache.txt

# Collect process information
echo "Collecting process data..."
ps auxww > $EVIDENCE_DIR/processes.txt
lsof -n > $EVIDENCE_DIR/open_files.txt
pstree -p > $EVIDENCE_DIR/process_tree.txt

# Collect user information
echo "Collecting user data..."
w > $EVIDENCE_DIR/logged_in_users.txt
last -50 > $EVIDENCE_DIR/login_history.txt
cat /etc/passwd > $EVIDENCE_DIR/users.txt

# Create hash of evidence
echo "Creating evidence hash..."
find $EVIDENCE_DIR -type f -exec sha256sum {} \; > $EVIDENCE_DIR/evidence_hashes.txt

echo "Evidence collection complete: $EVIDENCE_DIR"

Chain of Custody

  • Document who collected evidence and when
  • Use write-once media when possible
  • Calculate cryptographic hashes
  • Maintain access logs for evidence
  • Store in secure, tamper-evident location

Post-Incident Activities

Lessons Learned Meeting

Conduct within 5 business days of incident closure:

  • What went well?
  • What could be improved?
  • Were procedures followed?
  • Were there any tool/process gaps?
  • What preventive measures are needed?

Post-Incident Report Template

  1. Executive Summary
  2. Incident Timeline
  3. Root Cause Analysis
  4. Impact Assessment
  5. Response Effectiveness
  6. Recommendations
  7. Action Items

Incident Response Metrics

Key Performance Indicators

Metric Target Measurement
Mean Time to Detect (MTTD) < 1 hour Time from compromise to detection
Mean Time to Respond (MTTR) < 2 hours Time from detection to containment
Mean Time to Recovery < 4 hours Time from containment to recovery
Incident Recurrence Rate < 5% Repeat incidents within 90 days

Testing and Maintenance

Tabletop Exercises

  • Quarterly scenario-based discussions
  • Annual full-scale simulation
  • Monthly tool and contact verification
  • Post-exercise improvement implementation

Plan Maintenance

  • Review and update quarterly
  • Update after each incident
  • Annual comprehensive review
  • Track and implement improvements

Quick Reference

Emergency Contacts

Critical Commands

# Block IP immediately
iptables -A INPUT -s [IP] -j DROP

# Disable user account
usermod -L [username]

# Kill all user processes
pkill -u [username]

# Capture network traffic
tcpdump -i any -w /tmp/capture.pcap

# Check for rootkits
rkhunter --check
chkrootkit
Note: This documentation is provided for reference purposes only. It reflects general best practices and industry-aligned guidelines, and any examples, claims, or recommendations are intended as illustrative—not definitive or binding.